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POLYPEPTIDES HAVING CELLOBIOHYDROLASE II ACTIVITY 
AND POLYNUCLEOTIDES ENCODING SAME 

Field of the invention 

The present invention relates to polypeptides having cellobiohydrolase II (also referred 
to as CBH II or CBH 2) activity and polynucleotides having a nucleotide sequence which 
encodes for the polypeptides. The invention also relates to nucleic acid constructs, vectors, 
and host cells comprising the nucleic acid constructs as well as methods for producing and 
using the polypeptides. 



Background of the Invention 

Cellulose is an important industrial raw material and a source of renewable energy- The 
physical structure and morphology of native cellulose are complex and the fine details of its 
structure have been difficult to determine experimentally. However, the chemical composition 

15 of cellulose is simple, consisting of D-glucose residues linked by beta-1 .4-glycosidic bonds to 
form linear polymers with chains length of over 10.000 glycosidic residues. 

In order to be efficient, the digestion of cellulose requires several types of enzymes 
acting cooperatively. At least three categories of enzymes are necessary to convert cellulose 
into glucose: endo (1 ,4)-beta-D-glucanases (EC 3.2.1.4) that cut the cellulose chains at 

20 random; cellobiohydrolases (EC 3.2.1.91) which cleave cellobiosyl units from the cellulose 
chain ends and beta-glucosidases (EC 3.2.1.21) that convert cellobiose and soluble cello- 
dextrins into glucose. Among these three categories of enzymes involved in the bio- 
degradation of cellulose, cellobiohydrolases are the l^ey enzymes for the degradation of 
native crystalline cellulose. 

25 Exo-cellobiohydrolases (Cellobiohydrolase II, or CBH II) refer to the cellobiohydrolases 

which degrade cellulose by hydrolyzing the cellobiose from the reducing end of the cellulose 
polymer chains. The cellobiohydrolase II group belongs to the same EC group, that is EC 
3.2.1.91, as the cellobiohydrolase I group, the difference being that cellobiohydrolase I 
degrade cellulose by hydrolyzing the cellobiose from the non-reducing end of the cellulose 

30 polymer chains 

It is an object of the present Invention to provide improved polypeptides having cellobio- 
hydrolase II activity and polynucleotides encoding the polypeptides. The improved poly- 
peptides may have Improved specific activity and/or improved stability - in particular improved 
thermostability. The polypeptides may also have an improved ability to resist inhibition by 
35 cellobiose. 
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Summary of the Invention 

In a first aspect the present invention relates to a polypeptide having cellobiohydrolase 
II activity, selected from the group consisting of: 



(a) a polypeptide comprising an amino acid sequence selected from the group consisting 
of: 

an amino acid sequence which has at least 75%, identity with the amino acid sequence 
shown as amino acids 1 to 477 of SEQ ID NO:2, 

a polypeptide comprising an amino acid sequence selected from the group consisting 
of: 

an amino acid sequence which has at least 85% identity with the partial amino acid 
sequence shown as amino acids 1 to 82 of SEQ ID NO:4, 

a polypeptide comprising an amino acid sequence selected from the group consisting 
of: 

an amino add sequence which has at least 85% Identity with the partial amino acid 
sequence shown as amino acids 1 to 420 of SEQ ID NO:4, 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino add sequence which has at least 80% identity with the partial amino add 
sequence shown as amino acids 1 to 139 of SEQ ID NO:6, 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino add sequence which has at least 95% identity with the partial amino acid 
sequence shown as amino acids 1 to 102 of SEQ ID NO:8, 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino acid sequence which has at least 85% identity with the partial amino acid 

sequence shown as amino adds 1 to 144 of SEQ ID NO:10, 

a polypeptide comprising an amino add sequence selected from the group consisting 

of: 

an amino add sequence which has at least 75% identity with the partial amino acid 
sequence shown as amino acids 1 to 99 of SEQ ID NO: 12, 

a polypeptide comprising an amino acid sequence selected from the group consisting 
of: 

an amino add sequence which has at least 85% identity with the partial amino add 

sequence shown as amino adds 1 to 140 of SEQ ID NO:14, 

a polypeptide comprising an amino acid sequence selected from the group consisting 
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Of: 

an amino add sequence which has at least 75% identity with the partial amino acid 
sequence shown as amino adds 1 to 109 of SEQ ID NO:16, 

a polypeptide comprising an amino acid sequence selected from the group consisting 
5 of: 

an amino add sequence which has at least 75% identity with the amino add sequence 
shown as SEQ ID NO: 16, 

a polypeptide comprising an amino acid sequence selected from the group consisting 
of: 

10 an amino add sequence which has at least 75% identity with the partial amino acid 

sequence shown as amino acids 1 to 143 of SEQ ID NO: 18, 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino add sequence which has at least 70% identity with the partial amino add 
15 sequence shown as amino adds 1 to 71 of SEQ ID NO:20, 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino acid sequence which has at least 60% identify with the amino acid sequence 
shown as amino adds 1 to 220 of SEQ ID NO:22, 
20 a polypeptide comprising an amino acid sequence selected from the group consisting 

of: 

an amino add sequence which has at least 65% identity with the amino add sequence 
shown as amino adds 1 to 458 of SEQ ID NO:24, and 

a polypeptide comprising an amino add sequence selected from the group consisting 
25 of: 

an amino add sequence which has at least 70% identity with the amino add sequence 
shown as amino adds 1 to 390 of SEQ ID NO:26. 



(b) a polypeptide comprising an amino add sequence selected from the group consisting 
30 of: 

an amino add sequence which has at least 75% identity with the polypeptide encoded 
by the cellobiohydrolase 11 encoding part of the nucleotide sequence present in 
Chaetomium thermophilum, 

a polypeptide comprising an amino acid sequence selected from the group consisting 
35 of: 

an amino add sequence which has at least 85% identity with the polypeptide encoded 
by the cellobiohydrolase II encoding part of the nudeotide sequence present in 

3 
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Myceliophtora thermophila, 

a polypeptide comprising an amino acid sequence selected from the group consisting 
of: 

an amino acid sequence wiiich iias at least 80% identity with the polypeptide encoded 
by 

an amino acid sequence which has at least 80% identity with the polypeptide encoded 
by the cellobiohydrolase II encoding part of the nucleotide sequence present in 
Acremonlum thermophilum, 

an amino add sequence which has at least 95% identity with the polypeptide encoded 
by the cellobiohydrolase il encoding part of the nucleotide sequence present in 
Melanocarpus sp. , 

an amino add sequence whidi has at least 85% identity with the polypeptide encoded 
by the cellobiohydrolase II encoding part of the nucleotide sequence present in 
Thielavia microspora, 

an amino add sequence which has at least 75% identity with the polypeptide encoded 
by the cellobiohydrolase II encoding part of the nudeotide sequence present in 
Aspergillus sp., 

an amino acid sequence which has at least 85% identity with the polypeptide encoded 
by the cellobiohydrolase II encoding part of the nudeotide sequence present in 
Thielavia austmliensis, 

an amino add sequence which has at least 75% Identity with the polypeptide encoded 
by the cellobiohydrolase II encoding part of the nucleotide sequence present in 
Aspergillus tubingensis, 

an amino acid sequence which has at least 75% identity with the polypeptide encoded 
by the cellobiohydrolase II encoding part of the nudeotide sequence present in 
Gloeophyllum trabeum, 

an amino acid sequence which has at least 70% identity with the polypeptide encoded 
by the cellobiohydrolase II encoding part of the nudeotide sequence present in 
Meripllus giganteus, 

an amino acid sequence which has at least 60% identity with the polypeptide encoded 
by the cellobiohydrolase II encoding part of the nucleotide sequence present in 
Trichophaea saccate, 

an amino acid sequence which has at least 65% identity with the polypeptide encoded 
by the cellobiohydrolase II encoding part of the nucleotide sequence present in Stilbella 
annulate, and 

an amino acid sequence which has at least 70% identity with the polypeptide encoded 
by the cellobiohydrolase II encoding part of the nudeotide sequence present in 
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Malbrancheae cinnamomea . 
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(c) a polypeptide comprising an amino acid sequence selected from tiie group consisting 
of: 

an amino acid sequence whicti has at least 75% Identity with the polypeptide encoded 
by nucleotides 63 to 1493 of SEQ ID NO:1. 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino acid sequence which has at least 85% identity witii the polypeptide encoded 
by nucleotides 1 to 246 of SEQ ID NO:3. 

a polypeptide comprising an amino acid sequence selected from the group consisting 
of: 

an amino acid sequence which has at least 85% Identity with the polypeptide encoded 
by nucleotides 1 to 1272 of SEQ ID NO:3. 

a polypeptide comprising an amino add sequence selected fi^om the group consisting 
of: 

an amino add sequence which has at least 80% identity with ti^e polypeptide encoded 
by nudeotides 1 to 417 of SEQ ID NO:5, 

a polypeptide comprising an amino add sequence selected firom the group consisting 
of: 

an amino add sequence which has at least 95% identity witti the polypeptide encoded 
by nudeotides 1 to 306 of SEQ ID NO:7. 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino add sequence which has at least 85% Identity with the polypeptide encoded 
by nudeotides 1 to 432 of SEQ ID NO:9. 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino add sequence which has at least 75% Identity with the polypeptide encoded 
by nudeotides 1 to 297 of SEQ ID N0:11, 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino add sequence which has at least 85% identity witii the polypepti'de encoded 
by nudeotides 1 to 420 of SEQ ID N0:13. 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino add sequence which has at least 75% identity with the polypeptide encoded 
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by nucleotides 1 to 330 of SEQ ID NO:15. 

a polypeptide comprising an amino acid sequence selected from the group consisting 
of: 

an amino acid sequence wiiich has at least 75% identity \with the polypeptide encoded 
5 by nucleotides 1 to 1221 of SEQ ID NO: 15. 

a polypeptide comprising an amino acid sequence selected from the group consisting 
of: 

an amino acid sequence which has at least 75% identity with tiie polypeptide encoded 
by nucleotides 1 to 1221 of SEQ ID NO:15, 
10 a polypeptide comprising an amino acid sequence selected from the group consisting 

of: 

an amino add sequence whidi has at least 75% identity with the polypeptide encoded 
by nucleotides 1 to 429 of SEQ ID N0:17. 

a polypeptide comprising an amino acid sequence selected from the group consisting 
15 of: 

an amino acid sequence which has at least 70% identify witii ttie polypeptide encoded 
by nucleotides 1 to 213 of SEQ ID NO:19. 

a polypeptide comprising an amino acid sequence selected from the group consisting 
of: 

20 an amino acid sequence which has at least 60% identify with the polypeptide encoded 

by nucleotides 43 to 701 of SEQ ID NO:21 . 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino add sequence which has at least 65% identify with the polypeptide encoded 
25 by nucleotides 21 to 1 394 of SEQ ID NO:23. and 

a polypeptide comprising an amino add sequence selected from the group consisting 
of: 

an amino add sequence which has at least 70% identify with the polypeptide encoded 
by nucleotides 41 to 1210 of SEQ ID NO:25, 

30 

(d) a polypeptide which is encoded by a nudeotide sequence which hybridizes under high 
stringency conditions with a polynucleotide probe selected from tiie group consisting of: 

(I) the complementary sti^nd of the nudeotides selected from the group consisting of: 
35 nudeotides 63 to 1493 of SEQ ID NO:1 . 

nudeotides 1 to 246 of SEQ ID NO:3, 
nudeotides 1 to 1272 of SEQ ID NO:3, 
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nudeotides 1 to 417 of SEQ ID NO:5. 
nucleotides 1 to 306 of SEQ ID NO:7. 
nucleotides 1 to 432 of SEQ ID NO:9. 
nucleotides 1 to 297 of SEQ ID NO:1 1. 
5 nucleotides 1 to 420 of SEQ ID NO:1 3. 

nucleotides 1 to 330 of SEQ ID NO:15. 
nucleotides 1 to 1221 of SEQ ID NO:15, 
nucleotides 1 to 429 of SEQ ID NO:17. 
nucleotides 1 to 213 of SEQ ID NO:19. 
1 0 nucleotides 43 to 701 of SEQ ID NO:21 . 

nucleotides 21 to 1394 of SEQ ID NO:23. and 
nucleotides 41 to 1210 of SEQ ID NO:25. 

(ii)the complementary strand of the nucleotides selected from the group consisting of: 
1 5 nucleotides 63 to 563 of SEQ ID NO:1 . 

nucleotides 43 to 543 of SEQ ID NO:21 , 
nucleotides 21 to 521 of SEQ ID NO:23, and 
nucleotides 41 to 541 of SEQ ID NO:25. 

20 (iii) the complementary strand of the nucleotides selected from the group consisting of: 
nucleotides 63 to 263 of SEQ ID NO:1 . 
nucleotides 1 to 200 of SEQ ID NO:3. 
nucleotides 1 to 200 of SEQ ID NO:5, 
nucleotides 1 to 200 of SEQ ID NO:7, 
nucleotides 1 to 200 of SEQ ID NO:9, 
nucleotides 1 to 200 of SEQ ID NO:11, 
nucleotides 1 to 200 of SEQ ID NO: 13, 
nucleotides 1 to 200 of SEQ ID N0:15. 
nucleotides 1 to 1221 of SEQ ID NO:15, 
nucleotides 1 to 200 of SEQ ID NO:17, 
nucleotides 1 to 200 of SEQ ID NO:19, 
nudeotides 43 to 243 of SEQ ID NO:21 . 
nudeotides 21 to 221 of SEQ ID NO:23. and 
nudeotides 41 to 241 of SEQ ID NO:25. 

a fragment of (a), (b) or (c) that has cellobiohydrolase II activity. 



25 



30 



35 



(e) 
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In a second aspect the present invention relates to a polynucleotide having a nucleotide 
sequence which encodes for the polypeptide of the invention. 

In a third aspect the present invention relates to a nucleic add construct comprising the 
nucleotide sequence, which encodes for the polypeptide of the invention, operably linked to 
5 one or nnore control sequences that direct the production of the polypeptide in a suitable host. 

In a fourth aspect the present invention relates to a recombinant expression vector 
comprising the nucleic acid construct of the invention. 

In a fifth aspect the present invention relates to a recombinant host cell comprising the 
nucleic acid construct of the invention. 
10 In a sixth aspect the present invention relates to a method for producing a polypeptide 

of the invention, the method comprising: 

(a) cultivating a strain, which in its wild-type fonm is capable of producing the poly- 
peptide, to produce the polypeptide; and 

(b) recovering the polypeptide. 

15 In a seventh aspect the present invention relates to a method for producing a poly- 

peptide of the invention, the method comprising: 

(a) cultivating a recombinant host cell of the Invention under conditions conducive for 
production of the polypeptide; and 

(b) recovering the polypeptide. 

20 In an eight aspect the present invention relates to a method for In-situ production of a 

polypeptide of the invention, the method comprising: 

(a) cultivating a recombinant host cell of the Invention under conditions conducive for 
production of the polypeptide; and 

(b) contacting the polypeptide with a desired substrate without prior recovery of the 
25 polypeptide. 

Other aspects of the present Invention will be apparent from the below description and 
from the appended claims. 

Definitions 

30 Prior to discussing the present invention in further details, the following terms and 

conventions will first be defined: 

Substantially pure polvDeptide: In the present context, the tenm "substantially pure 
polypeptide" means a polypeptide preparation which contains at the most 10% by weight of 
other polypeptide material with which It is natively associated (lower percentages of other 

35 polypeptide material are preferred, e.g. at the most 8% by weight, at the most 6% by weight, 
at the most 5% by weight, at the most 4% at the most 3% by weight, at the most 2% by 
weight, at the most 1% by weight, and at the most 0.5 % by weight). Thus, it is prefenred that 
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the substantially pure polypeptide is at least 92% pure, i.e. that the polypeptide constitutes at 
least 92% by weight of the total polypeptide material present in the preparation, and higher 
percentages are preferred such as at least 94% pure, at least 95% pure, at least 96% pure, 
at least 96% pure, at least 97% pur&, at least 98% pure, at least 99%, and at the most 99.5% 
pure. The polypeptides disclosed herein are preferably in a substantially pure form. In 
particular. It is prefenred that Vne polypeptides disclosed herein are In "essentially pure fonn". 
i.e. that the polypeptide preparation is essentially free of otiier polypeptide material witii 
whl<^ it is natively assodated. This can be accomplished, for example, by preparing tiie 
polypeptide by means of well-known recombinant methods. Herein, the term "substantially 
pure polypeptide" is synonymous with the tems "isolated polypeptide" and "polypeptide in 
isolated form". 

Cellc±iohvdmlase II actMtv: The tenn "celloblohydrolase II activity" is defined herein as 
a cellulose 1,4-beta-cellobiosidase (also refenred to as Exo-glucanase, Exo-oellobiohydrolase 
or 1,4-beta-cellobiohydrolase) activity, as defined in tiie enzyme class EC 3.2.1.91 or CAZy 
Family Glycoside Hydrolase Family 6. which catalyzes tiie hydrolysis of 1,4-beta-D-glucosldic 
linkages in cellulose and celloteti^aose, releasing cellobipse from the reducing ends of tiie 
chains. 

For purposes of the present invention, cellobiohydrolase 11 activity may be determined 
according to the procedure described in Example 2. 

In an embodiment, cellobiohydrolase II activity may be determined according to the 
procedure described In Deshpande MV et al.. Methods in Enzymology, pp. 126-130 (1988): 
"Selective Assay for Exo-1,4-Beta-Glucanases". According to this procedure, one unit of 
cellobiohydrolase II activity (agluconic bond cleavage activity) is defined as 1.0 micromoie of 
p-nitrophenol produced per minute at SO'C, pH 5.0. The polypeptides of the present invention 
should preferably have at least 20% of the cellobiohydrolase II activity of a polypeptide 
consisting of an amino acid sequence selected from the group consisting of SEQ ID NO:2. 
SEQ ID NO:2, SEQ ID NO:4, SEQ ID N0:6, SEQ ID NO:8, SEQ ID NO:10. SEQ ID NO:12. 
SEQ ID NO:14. SEQ ID NO:16. SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID 
NO:24, and SEQ ID NO:26. In a particular prefen^d embodiment, tiie polypeptides should 
have at least 40%. such as at least 50%, preferably at least 60%, such as at least 70%, more 
preferably at least 80%, such as at least 90%. most preferably at least 95%, such as about or 
at least 100% of tiie cellobiohydrolase II activity of the polypeptide consisting of the amino 
add sequence selected from the group consisting of: 

amino acids 1 to 477 of SEQ ID NO:2. 

amino adds 1 to 82 of SEQ ID N0:4. 

amino acids 1 to 420 of SEQ ID N0:4, 

amino adds 1 to 139 of SEQ ID NO:6, 

9 
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amino adds 1 to 102 of SEQ ID NO:8. 
amino adds 1 to 144 of SEQ ID NO:10. 
amino adds 1 to 99 of SEQ ID NO:12. 
amino adds 1 to 140 of SEQ ID NO:14. 
amino adds 1 to 109 of SEQ ID NO: 16, 
amino adds 1 to 407 of SEQ ID NO: 16, 
amino adds 1 to 143 of SEQ ID NO:18, 
amino adds 1 to 71 of SEQ ID NO:20. 
amino adds 1 to 220 of SEQ ID NO:22. 
amino adds 1 to 458 of SEQ ID NO:24, and 
amino adds 1 to 390 of SEQ ID NO:26. 

Identity: In the present context, the homology between two amino add sequences or 
between two nudeotide sequences is described by the parameter "Identity". 

For purposes of the present invention, the degree of identity between two amino acid 
sequences is determined by using the program FASTA included in version 2.0x of the FASTA 
program paclcage (see W. R. Pearson and D. J. LIpman (1988), "Improved Tools for 
Biological Sequence Analysis". PNAS 85:2444-2448; and W. R. Pearson (1990) "Rapid and 
Sensitive Sequence Comparison with FASTP and FASTA", Methods in Enzymoiogy 183:63- 
98). The scoring matrix used was BLOSUM50, gap penalty was -12, and gap extension 
penalty was -2. 

The degree of identity between two nucleotide sequences is determined using the 
same algorithm and software package as described above. The scoring matrix used was the 
identity matrix, gap penalty was -16, and gap extension penalty was -4. 

Fragment: When used herein, a "fragmenf of a sequence selected from the group 
consisting of SEQ ID NO:2, SEQ ID N0:4, SEQ ID NO:6. SEQ ID NO:8, SEQ ID NO:10, 
SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:18, SEQ ID NO:20, SEQ ID 
NO:22, SEQ ID NO:24, and SEQ ID NO:26 is a polypeptide having one or more amino acids 
deleted from the amino and/or carboxyl terminus of this amino acid sequence. 

Allelic variant: In the present context, the term "allelic variant" denotes any of two or 
more alternative forms of a gene occupying the same chromosomal locus. Allelic variation 
arises naturally through mutation, and may result in polymorphism within populations. Gene 
mutations can be silent (no change in the encoded polypeptide) or may encode polypeptides 
having altered amino add sequences. An allelic variant of a polypeptide is a polypeptide 
encoded by an allelic variant of a gene. 

Substantially pure polynucleotide: The terni "substantially pure polynudeotide" as used 
herein refers to a polynudeotide preparation, wherein the polynucleotide has been removed 

10 
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from its natural genetic milieu, and is thus free of other extraneous or unwanted coding 
sequences and is in a form suiteble for use within genetically engineered protein production 
systems. Thus, a substantially pure polynucleotide contains at the most 10% by weight of 
other polynucleotide material with which it is natively associated (lower percentages of other 
5 polynucleotide material are prefen-ed. e.g. at the most 8% by weight, at the most 6% by 
weight, at the most 5% by weight, at the most 4% at the most 3% by weight, at the most 2% 
by weight, at the most 1% by weight, and at the most 0.5 % by weight). A substantially pure 
polynucleotide may, however. Include naturally occurring 5' and 3' untranslated regions, such 
as promoters and tenninators. It is prefenned that the substantially pure polynucleotide is at 

10 least 92% pure. I.e. that the polynucleotide constitutes at least 92% by weight of the total 
polynucleotide material present in the preparation, and higher percentages are prefenred 
such as at least 94% purs, at least 95% pure, at least 96% pure, at least 96% pure, at least 
97% pure, at least 98% pure, at least 99%, and at ttie most 99.5% pure. The polynucleotides 
disclosed herein are preferably In a substantially pure fomn. In particular, it is prefenred that 

15 the polynucleotides disclosed herein are in "essentially pure form". I.e. that the polynucleotide 
preparation is essentially firee of other polynucleotide material with which It is natively 
associated. Herein, the term "substantially pure polynucleotide" Is synonymous with Vne terms 
"isolated polynucleotide" and "polynucleotide In isolated form". 

Modmcaaon(s): In tiie context of tiie present Invention the temi "modification(s)" is 

20 intended to mean any chemical modification of a polypeptide consisting of an amino add 
sequence selected from tiie group consisting of SEQ ID NO:2. SEQ ID NO:4. SEQ ID N0:6, 
SEQ ID NO:8, SEQ ID NO:10. SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16. SEQ ID 
NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, and SEQ ID NO:26, as well as 
genetic manipulation of the DNA encoding that polypeptide. The modlfication(s) can be 

25 replacement(s) of the amino acid side chain(s), substitution(s), deletion(s) and/or 
insertions(s) in or at the amino acid(s) of interest. 

Artmcial variant: When used herein, the terni "artificial variant" means a polypeptide 
having celloblohydrolase II activity, which has been produced by an organism which is 
expressing a modified gene as compared to SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, 

30 SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:11, SEQ ID NO:13. SEQ ID 
NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23, and SEQ ID NO:25. 
The modified gene, from which said variant is produced when expressed in a suitable host, is 
obtained through human intervention by modification of a nucleotide sequence selected from 
the group consisting of SEQ ID NO:1, SEQ ID NO:3. SEQ ID NO:5, SEQ ID NO:7, SEQ ID 

35 NO:9. SEQ ID NO:13, SEQ ID NO:11, SEQ ID NO:13. SEQ ID NO:15, SEQ ID NO:17, SEQ 
ID NO:19. SEQ ID NO:21. SEQ ID NO:23. and SEQ ID NO:25. 

cDNA: The terni "cDNA" when used in the present context, is intended to cover a DNA 

11 
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molecule which can be prepared by reverse transcription from a mature, spliced, mRIMA 
molecule derived from a eulcaryotic cell. cDNA lacks the intron sequences that are usually 
present In the conresponding genomic DNA. The Initial, primary RNA transcript is a precursor 
to mRNA and it goes through a series of processing events before appearing as mature 
spliced mRNA. These events include the removal of intron sequences by a process called 
splicing. When cDNA is derived from mRNA it therefore lacks intron sequences. 

Nucleic acid consbvct: When used herein, the temn "nucleic acid construcP* means a 
nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally 
occurring gene or which has been modified to contain segments of nucleic acids in a manner 
that would not othenvise exist in nature. The temn nucleic acid construct is synonymous with 
the term "expression cassette" when the nucleic acid construct contains the control 
sequences required for expression of a coding sequence of the present invention. 

Control sequence: The term "control sequences" is defined herein to include all 
components, which are necessary or advantageous for the expression of a polypeptide of the 
present Invention. Each control sequence may be native or foreign to the nucleotide 
sequence encoding the polypeptide. Such control sequences include, but are not limited to, a 
leader, polyadenylation sequence, propeptide sequence, promoter, signal peptide sequence, 
and transcription terminator. At a minimum, the control sequences include a promoter, and 
transcriptional and translational stop signals. The control sequences may be provided with 
linkers for the purpose of introducing specific restriction sites facilitating ligation of the control 
sequences with the coding region of the nucleotide sequence encoding a polypeptide. 

Ooemblv linked: The tenn "operably linked" is defined herein as a configuration in 
which a control sequence is appropriately placed at a position relative to the coding sequence 
of the DNA sequence such that the control sequence directs the expression of a polypeptide. 

Coding sequence: When used herein the tenri "coding sequence" is intended to cover a 
nucleotide sequence, which directly specifies the amino acid sequence of Its protein product. 
The boundaries of the coding sequence are generally detenmined by an open reading frame, 
which usually begins with ttie ATG start codon. The coding sequence typically include DNA, 
cDNA, and recombinant nucleotide sequences. 

Expression: In the present context, the temn "expression" includes any step Involved In 
the production of the polypeptide Including, but not limited to, t^nscription, post- 
transcriptional modification, translation, post-ti^anslational modification, and secretion. 

Expression vector: In the present context, the term "expression vector" covers a DNA 
molecule, linear or circular, that comprises a segment encoding a polypeptide of the 
Invention, and which is operably linked to additional segments that provide for its 
transcription. 

Host cell: The term "host cell", as used herein, includes any cell type which Is suscep- 
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tible to transfonnnation with a nucleic add constnicL 

The terms "polynucleotide probe", "hybridization" as well as the various stringency 
conditions are defined in the section entitled "Polypeptides Having Cellobiohydrolase II 
Activity". 

ThRrmosfabilitv: The term "thermostability", as used herein, is measured as described 
in Example 2. 

Detailed Description of the Invention 

Polypeptides Having Cellobiohydrolase 11 Activity 

In a first embodiment, the present Invention relates to polypeptides having cellobio- 
hydrolase II activity and where the polypeptides comprises, preferably consists of, an amino 
acid sequence which has a degree of identity to an amino acid sequence selected from the 
group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6. SEQ ID NO:8, SEQ ID 
NO:10, SEQ ID NO:12, SEQ ID NO:14. SEQ ID NO:16. SEQ ID NO:18, SEQ ID NO:20. SEQ 
ID NO:22, SEQ ID NO:24, and SEQ ID NO:26., {i.e., the mature polypeptide) of at least 65%, 
preferably at least 70%, e.g. at least 75%, more preferably at least 80%, such as at least 
85%, even more preferably at least 90%, most preferably at least 95%. e.g. at least 96%, 
such as at least 97%, and even most preferably at least 98%. such as at least 99% 
(hereinafter "homologous polypeptides"). In an interesting embodiment, the amino add 
sequence differs by at the most ten amino adds (e.g. by ten amino adds), in particular by at 
the most five amino acids (e.g. by five amino adds), such as by at the most four amino acids 
(e.g. by four amino acids), e.g. by at the most three amino acids (e.g. by three amino adds) 
from an amino add sequence selected from the group consisting of SEQ ID NO:2, SEQ ID 
NO:4, SEQ ID NO:6. SEQ ID N0:8. SEQ ID NO:10, SEQ ID NO:12. SEQ ID NO:14, SEQ ID 
NO: 16, SEQ ID NO: 18, SEQ ID NO:20. SEQ ID NO:22, SEQ ID NO:24. and SEQ ID NO:26. 
In a particular interesting embodiment, the amino acid sequence differs by at the most two 
amino acids (e.g. by two amino acids), such as by one amino add from an amino acid 
sequence selected from the group consisting of SEQ ID NO:2. SEQ ID NO:4, SEQ ID NO:6, 
SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12. SEQ ID NO:14, SEQ ID NO:16. SEQ ID 
NO:18, SEQ ID NO:20. SEQ ID NO:22. SEQ ID NO:24, and SEQ ID NO:26. 

Preferably, the polypeptides of the present invention comprise an amino add sequence 
selected from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID 
NO:8. SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:18, SEQ 
ID NO:20, SEQ ID NO:22, SEQ ID NO:24, and SEQ ID NO:26.; an allelic variant thereof; or a 
fragment thereof that has cellobiohydrolase II activity. In another prefenred embodiment, the 
polypeptide of the present invention consists of an amino acid sequence selected from the 

13 
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group consisting of SEQ ID NO:2. SEQ ID NO:4, SEQ ID NO:6. SEQ ID NO:8, SEQ ID 
NO:10. SEQ ID NO:12. SEQ ID N0:14, SEQ ID NO:16. SEQ ID NO:18, SEQ ID NO:20. SEQ 
ID NO:22, SEQ ID NO:24. and SEQ ID NO:26. 

Tlie polypeptide of the invention may be a wild-type celloblohydrolase II identified and 
isolated firom a natural source. Such wild-fype polypeptides may be specifically screened for 
by standard techniques known in the art, such as molecular screening as described in 
Example 1. Furthermore, the polypeptide of the invention may be prepared by the DNA 
shuffling technique, such as described in J.E. Ness et al. NaturB Biotechnology 17, 893-896 
(1999). Moreover, the polypeptide of the invention may be an artificial variant which 
comprises, preferably consists of, an amino acid sequence that has at least one substitution, 
deletion and/or insertion of an amino acid as compared to an amino acid sequence selected 
from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ 
ID NO:10. SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16. SEQ ID NO:18, SEQ ID NO:20, 
SEQ ID NO:22, SEQ ID NO:24, and SEQ ID NO:26. Such artificial variants may be 
constructed by standard techniques known in the art, such as by site-directed/random 
mutagenesis of the polypeptide comprising an amino acid sequence selected from the group 
consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10. 
SEQ ID NO:12. SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:18, SEQ ID NO:20, SEQ ID 
NO:22, SEQ ID NO:24, and SEQ ID NO:26. In one embodiment of the invention, amino acid 
changes (in the artificial variant as well as in wild-type polypeptides) are of a minor nature, 
that Is conservative amino acid substitutions that do not significantly affect the folding and/or 
activity of the protein; small deletions, typicaity of one to about 30 amino acids; small amino- 
or carbo)tyl-temiinal extensions, such as an amlno-temninal methionine residue; a small linker 
peptide of up to about 20-25 residues; or a small extension that facilitates purification by 
changing net charge or another function, such as a poly-histidine tract, an antigenic epitope 
or a binding domain. 

Examples of conservative substitutions are within the group of basic amino acids 
(arginine, lysine and histidine), acidic amino adds (glutamic acid and aspartic acid), polar 
amino acids (glutamine and asparagine), hydrophobic amino acids (leucine, isoleudne, valine 
and methionine), aromatic amino acids (phenylalanine, tryptophan and tyrosine), and small 
amino acids (glycine, alanine, serine and threonine). Amino add substitutions which do not 
generally alter the specific activity are known in the art and are described, for example, by H. 
Neurath and R.L. Hill, 1979, In, The Proteins, Academic Press, New Yori<. The most com- 
monly occuning exchanges are Ala/Ser, Val/lle, Asp/Glu, Thr/Ser, Ala/Gly. Alamir, Ser/Asn, 
Ala/Val, Ser/Gly, Tyr/Phe. Ala/Pro, Lys/Arg, Asp/Asn, Leu/lle, Leu/Val, Ala/Glu, and Asp/Gty 
as well as these in reverse. 

in an Interesting embodiment of the invention, the amino acid changes are of such a 
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nature that the physico-chemical properties of the polypeptides are altered. For example, 
amino add changes may be performed, v^ich improve the thermal stability of the 
polypeptide, whidi alter the substrate spedfidty, which changes the pH optimum, and the 
like. 

5 Preferably, the number of such substitutions, deletions and/or insertions as compared 

to an amino add sequence selected from the group consisting of SEQ ID NO:2, SEQ ID 
NO:4, SEQ ID NO:6. SEQ ID NO:8, SEQ ID N0:10, SEQ ID NO:12. SEQ ID NO:14, SEQ ID 
NO:16, SEQ ID NO:18, SEQ ID NO:20. SEQ ID NO:22, SEQ ID NO:24, and SEQ ID NO:26. 
is at the most 10. such as at the most 9, e.g. at the most 8. more preferably at the most 7, 

10 e.g. at the most 6. such as at the most 5, most preferably at the most 4, e.g. at the most 3, 
such as at the most 2, in particular at the most 1 . 

The present inventors have isolated nucleotide sequences encoding polypeptides 
having ceilobiohydrolase II acHvity from the microorganisms selected from the group 
consisting of Chaetomium theimophilum, Myceltophthora thermophila, Acremonium 

15 ffiermophilum, Thielavia australlensls, Thielavia microspore, Aspergillus tubingensis. 
Aspergillus tubingensis syn. Aspergillus neotubingensis Frisvad sp.nov., Gloeophyllum 
trabeum , Meripilus giganteus, Trichophaea saccata. Stilbella annulate, Stilbella annulata and 
MalbranGheae cinnamomea. 

Thus, in a second embodiment, the present invention relates to polypeptides comprising an 

20 amino acid sequence which has at least 65% identity with the polypeptide encoded by the 
ceilobiohydrolase 11 encoding part of the nucleotide sequence present in an organism 
selected from the group consisting of Chaetomium thenrtophllum CGMCC 0859, 
Mycelioptithona Oiermophila CGMCC 0862, Myceliophthora thennophlla CGMCC 0862, 
Acremonium sp. T178-4 CGMCC 0857, Acremonium sp. T178-4, Melanocarpus sp. CGMCC 

25 0861, Thielavia microspore CGMCC 0863, Aspergillus sp. T186-2 CGMCC 0858,rAi/e/awa 
australiensis CGMCC 0864, Gloeoph^lum trabeum ATCC 11.39, Aspergillus tubingensis. 
CBS 161.79, Trichophaea saccata, CBS 804.70, Stilbella annulata CBS 185.70, and 
Malbranchea cinnamomea, CBS 115.68, In an interesting embodiment of the Invention, the 
polypeptide comprises an amino add sequence which has at least 70%, e.g. at least 75%, 

30 preferably at least 80%, such as at least 85%. more preferably at least 90%, most preferably 
at least 95%. e.g. at least 96%, such as at least 97%. and even most preferably at least 98%, 
such as at least 99% identity with the polypeptide encoded by the ceilobiohydrolase II 
encoding part of the nucleotide sequence present in an organism selected from the group 
consisting of Chaetomium themtophilum CGMCC 0859, Myceliophthora thennophila CGMCC 

35 0862, MyceliopMhora thermophila CGMCC 0862, Acremonium sp. T178-4 CGMCC 0857, 
Aaremonium sp. T178-4, A/fe/anocafpus sp. CGMCC 0861, Thielavia microspora CGMCC 
0863, Aspergillus sp. T186-2 CGMCC 0858,Thielavia australiensis CGMCC 0864, 
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Gloeophyllum trabeum ATCC 11.39. Aspergillus tubingensis, CBS 161.79, Trichophaea 
saccata. CBS 804.70, StilbeUa annulate CBS 185.70, and Malbranchea dnnamomea. CBS 
115.68. (hereinafter "honnologous polypeptides"). In an Interesting embodiment, the amino 
acid sequence differs by at the most ten amino acids (e.g. by ten amino acids), in particular 
by at the most five amino adds (e.g. by five amino adds), such as by at the most four amino 
acids (e.g. by four amino adds), e.g. by at the most three amino acids (e.g. by three amino 
acids) from the polypeptide encoded by the celioblohydrolase II encoding part of the 
nudeotide sequence present in an organism selected from the group consisting of 
Chaetomium thermophilum CGI\/ICC 0859, Myceliophthora thennophUa CGMCC 0862, 
Myceliophthora Uiermophila CGMCC 0862, Acremonium sp. T178-4 CGMCC 0857, 
Acramonium sp. T178-4. Melanocarpus sp. CGMCC 0861, Thielavia microspora CGMCC 
0863, Aspergillus sp. T186-2 CGMCC 0858, Thielavia austaliensis CGMCC 0864, 
Gloeophyllum trabeum ATCC 11.39, Aspergillus tubingensis, CBS 161.79, Trichophaea 
saccata, CBS 804.70, Stiibella annulate CBS 185.70, and Malbranchea dnnamomea, CBS 
115.68. In a particular interesting embodiment, the amino add sequence differs by at the 
most two amino adds (e.g. by two amino acids), such as by one amino add from the 
polypeptide encoded by the celioblohydrolase II encoding part of the nudeotide sequence 
present in an organism selected from the group consisting of Chaetomium thermophilum 
CGMCC 0859, Myceliophthora thermophiia CGMCC 0862, Myceliophthora thermophlla 
CGMCC 0862, Acremonium sp. T178-4 CGMCC 0857, Acremonium sp. T178-4, 
Melanocarpus sp. CGMCC 0861, Thielavia microspora CGMCC 0863, Aspergillus sp. T186-2 
CGMCC 0858,r/i/eteWa australlensls CGMCC 0864, Gloeophyllum trabeum ATCC 11.39. 
Aspergillus tubingensis. CBS 161.79, Trichophaea saccata, CBS 804.70, Stiibella annulate 
CBS 185.70, and Malbranchea dnnamomea, CBS 115.68. 

In a third embodiment, the present invention relates to polypeptides having 
celioblohydrolase II activity which are encoded by nucleotide sequences which hybridiz© 
under very low stringency conditions, preferably under low stringency conditions, more 
preferably under medium stringency conditions, more preferably under medium-high 
stringency conditions, even more preferably under high stringency conditions, and most 
preferably under very high stringency conditions wtth a polynudeotide probe selected from 
the group consisting of tiie complementary strand of the nudeotides selected from the group 
consisting of: 

nudeotides 63 to 1493 of SEQ ID NO:1 , 

nudeotides 1 to 246 of SEQ ID NO:3, 

nudeotides 1 to 417 of SEQ ID NO:5, 

nucleotides 1 to 306 of SEQ ID NO:7, 

nucleotides 1 to 432 of SEQ ID NO:9, 
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nucleotides 1 to 297 of SEQ ID NO:1 1 , 
nucleotides 1 to 420 of SEQ ID NO: 13, 
nucleotides 1 to 330 of SEQ ID NO:15, 
nucleotides 1 to 1221 of SEQ ID NO:15, 
nucleotides 1 to 429 of SEQ ID NO: 17. 
nucleotides 1 to 213 of SEQ ID NO:19, 
nucleotides 43 to 701 of SEQ ID NO:21. 
nucleotides 21 to 1394 of SEQ ID NO:23, and 
nucleotides 41 to 1210 of SEQ ID NO:25. 

In another embodiment, the present invention relates to polypeptides having 
cellobiohydrolase II activity which are encoded by the cellobiohydrolase II encoding part of 
the nucleotide sequence present in a microorganism selected from the group consisting of: 

a microorganism belonging to the family Chaetomlaceae, preferably to the genus 
Chaetomium, more preferably to the species Chaetomium thennophllum, 

a microorganism belonging to the genus Myceliophthora, preferably to the species 
Myceliophthora thermophila, 

a microorganism belonging to the species Acremonium thermophilum, 

a microorganism belonging to the family Chaetomiaceae, preferably to the genus 
Thielavia, preferably to the species Thielavia australiensis 

a microorganism belonging to the genus Aspergillus, preferably belonging to the black 
Aspergilli. 

a microorganism belonging to the family Chaetomiaceae, preferably to the genus 
Thielavia, preferably to the species Thielavia microspore, 

a microorganism belonging to the genus Aspergillus, preferably belonging to the black 
Aspergilli, more preferably to the species Aspergillus tublngensis, and most preferably 
to the species A. neotubingensis Frisvad sp.nov. 

a microorganism belonging to the Polyporales, preferably belonging to the family 
Fomitopsidacea, more preferably belonging to the genus Gloeophyllum, most preferably 
to the species Gloeophyllum trabeum 

a microorganism belonging to the Hymenochaetales, preferably belonging to the family 
Rigidiporaceae, preferably belonging to the genus Meripilus, more preferably to the 
species Menpilus giganteus. 
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a microorganism belonging to tlie Pezlzomycotlna, preferably belonging to Pezizales, 
preferably belonging to tlie family Pymnemataceae or the family Sarcosomataceae, 
more preferably belonging to the genus Trichophaea or the genus Pseudoplectania, 
most preferably Trichophaea saccata, 

5 a microorganism belonging to the species Stilbella annulata, and 

a microorganism belonging to the species Malbrancheae cinnamomea. 

A nucleotide sequence selected from the group consisting of SEQ ID NO:1, SEQ ID 
NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:13. SEQ ID NO:11. SEQ ID 

10 NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, and SEQ ID NO:23, 
and SEQ ID NO:25 or a subsequence thereof, as well as an amino acid sequence selected 
from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6. SEQ ID NO:8, SEQ 
ID NO:10. SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16. SEQ ID NO:18. SEQ ID NO:20, 
SEQ ID NO:22, SEQ ID NO:24, and SEQ ID NO:26., or a fragment thereof, may be used to 

15 design a polynucleotide probe to identify and clone DNA encoding polypeptides having 
cellobiohydrolase II activity from strains of different genera or species according to methods 
well known in the art. In particular, such probes can be used for hybridization with the 
genomic or cDNA of the genus or species of interest, following standard Southern blotting 
procedures, in order to identify and isolate the corresponding gene therein. Such probes can 

20 be considerably shorter than the entire sequence, but should be at least 15, preferably at 
least 25, more preferably at least 35 nucleotides in length, such as at least 70 nucleotides in 
length. It is, however, preferred that the polynucleotide probe is at least 100 nucleotides in 
length. For example, the polynucleotide probe may be at least 200 nucleotides in length, at 
least 300 nucleotides in length, at least 400 nucleotides in length or at least 500 nucleotides 

25 In length. Even longer probes may be used, e.g., polynucleotide probes which are at least 
600 nucleotides in length, at least 700 nucleotides in length, at least 800 nucleotides in 
length, or at least 900 nucleotides in length. Both DNA and RNA probes can be used. The 
probes are typically labeled for detecting the corresponding gene (for example, with ^P, ^H, 
^S, blotin, or avidin). 

30 Thus, a genomic DNA or cDNA library prepared from such other organisms may be 

screened for DNA which hybridizes with the probes described above and which encodes a 
polypeptide having cellobiohydrolase II activity. Genomic or other DNA from such other 
organisms may be separated by agarose or polyacrylamide gel electrophoresis, or other 
separation techniques. DNA from the libraries or the separated DNA may be transferred to, 

35 and immobilized, on nitrocellulose or other suitable carrier materials. In order to identify a 
clone or DNA which is homologous with one of the sequence shown in SEQ ID NO:1 , SEQ ID 

18 
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NO:3, SEQ ID NO:5. SEQ ID N0:7, SEQ ID NO:9. SEQ ID NO:13. SEQ ID NO:11, SEQ ID 
NO:13. SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23. and 
SEQ ID NO:25, the carrier material witti the immobilizecl DNA is used in a Southern blot. 

For purposes of the present invention, hybridization indicates that the nucleotide 
5 sequence hybridizes to a labeled polynucleotide probe which hybridizes to any of the 
nucleotide sequences shown in SEQ ID NO:1. SEQ ID NO:3, SEQ ID NO:5. SEQ ID NO:7, 
SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:11. SEQ ID NO:13, SEQ ID NO:15, SEQ ID 
NO:17, SEQ ID NO:19, SEQ ID NO:21. SEQ ID NO:23, and SEQ ID NO:25 under very low to 
very high stringency conditions. Molecules to which the polynucleotide probe hybridizes under 
10 these conditions may be detected using X-ray film or by any other method known in the art. 
Whenever the ierm "polynucleotide probe" is used in the present context, it is to be 
understood that such a probe contains at least 15 nucleotides. 

In an interesting embodiment, the polynucleotide probe is the complementary strand of 
the nucleotides selected from the group consisting of: 
1 5 nucleotides 63 to 1493 of SEQ ID NO:1 , 

nucleotides 1 to 246 of SEQ ID NO:3. 
nucleotides 1 to 1272 of SEQ ID NO:3, 
nucleotides 1 to 417 of SEQ ID NO:5. 
nucleotides 1 to 306 of SEQ ID NO:7. 
20 nucleotides 1 to 432 of SEQ ID NO:9, 

nucleotides 1 to 297 of SEQ ID NO: 1 1 , 
nucleotides 1 to 420 of SEQ ID NO:13. 
nucleotides 1 to 330 of SEQ ID NO:15. 
nucleotides 1 to 1 221 of SEQ ID NO: 1 5, 
25 nucleotides 1 to 429 of SEQ ID NO:17, 

nucleotides 1 to 213 of SEQ ID NO:19, 
nucleotides 43 to 701 of SEQ ID NO:21 , 
nucleotides 21 to 1394 of SEQ ID NO:23, and 
nucleotides 41 to 1210 of SEQ ID NO:25. 

30 

In another interesting embodiment, the polynucleotide probe is the complementery 
strand of the nucleotide sequence which encodes a polypeptide selected from the group 
consisting of SEQ ID NO:2. SEQ ID NO:4. SEQ ID NO:6, SEQ ID NO:8. SEQ ID NO:10, 
SEQ ID NO:12, SEQ ID N0:14. SEQ ID NO:16. SEQ ID NO:18, SEQ ID NO:20, SEQ ID 
35 NO:22, SEQ ID NO:24. and SEQ ID NO:26. In a further interesting embodiment, the 
polynucleotide probe Is the complementary strand of a nucleotide sequence selected from 
the group consisting of SEQ ID NO:1, SEQ ID NO:3. SEQ ID NO:5. SEQ ID NO:7, SEQ ID 

19 
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NO:9. SEQ ID NO:13, SEQ ID NO:11. SEQ ID NO:13. SEQ ID NO:15, SEQ ID NO:17, SEQ 
ID NO:19, SEQ ID NO:21. SEQ ID NO:23 and SEQ ID NO:25. 

For long probes of at least 100 nucleotides in length, very low to very high stringency 
conditions are defined as prehybridlzation and hybridization at 42°C in 5X SSPE. 1.0% SDS. 
5 5X Denhardts solution, 100 microg/mi sheared and denatured salmon sperm DNA. following 
standard Southern blotting procedures. Preferably, the long probes of at least 100 
nucleotides do not contain more than 1000 nucleotides. For long probes of at least 100 
nucleotides In length, the carrier material Is finally washed three times each for 15 minutes 
using 2 x SSC, 0.1% SDS at 42"C (very low stringency), preferably washed three times each 

10 for 1 5 minutes using 0.5 x SSC, 0. 1 % SDS at 42°C (low stringency), more preferably washed 
three times each for 15 minutes using 0.2 x SSC, 0.1% SDS at 42''C (medium stringency), 
even more preferably washed three times each for 15 minutes using 0.2 x SSC. 0.1% SDS at 
SS'C (medium-high stringency), most preferably washed three times each for 15 minutes 
using 0.1 x SSC, 0.1% SDS at OO'C (high stringency), in particular washed three times each 

15 for 1 5 minutes using 0. 1 x SSC, 0.1% SDS at 68°C (very high stringency). 

Although not particularly preferred, it is contemplated that shorter probes, e.g. probes 
which are from about 15 to 99 nucleotides in length, such as from about 15 to afc>out 70 
nucleotides in length, may be also be used. For such short probes, stringency conditions are 
defined as prehybridlzation, hybridization, and washing post-hybridization at 5''C to lO'C 

20 below the calculated Tm using the calculation according to Bolton and McCarthy (1962, 
Proceedings of the National Academy of Sciences USA 48:1390) in 0.9 M NaCI, 0.09 M Tris- 
HCI pH 7.6, 6 mM EDTA. 0.5% NP-40, IX Denhardt's solution, 1 mM sodium pyrophosphate, 
1 mM sodium monobasic phosphate. 0.1 mM ATP, and 0.2 mg of yeast RNA per ml following 
standard Southern blotting procedures. 

25 For short probes which are about 15 nucleotides to 99 nucleotides in length, the canier 

material is washed once in 6X SCC plus 0.1% SDS for 15 minutes and twice each for 15 
minutes using 6X SSC at 5°C to lO'C below the calculated T^. 

Sources for Polypeptides Having Cellobiohydrolase II Activity 

30 A polypeptide of the present invention may be obtained from microorganisms of any 

genus. For purposes of the present invention, the terni "obtained from" as used herein shall 
mean that the polypeptide encoded by the nucleotide sequence is produced by a cell in which 
the nucleotide sequence is naturally present or into which the nucleotide sequence has been 
inserted. In a preferred embodiment, the polypeptide is secreted extracellulariy. 

35 A polypeptide of the present Invention may be a bacterial polypeptide. For example, the 

polypeptide may be a gram positive bacterial polypeptide such as a Bacillus polypeptide, e.g., 
a Bacillus alkalophilus. Bacillus amyloliquefaciens. Bacillus brevis. Bacillus circulans. Bacillus 

20 
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coagulans. Bacillus lautus. Bacillus lentus. Bacillus lichentformis. Bacillus megaterium. 
Bacillus stearothermophilus. Bacillus subtilis, or Bacillus thuringiensis polypeptide; or a 
Streptomyces polypeptide, e.g., a Streptomycas livldans or Streptomyces murinus 
polypeptide; or a gram negative bacterial polypeptide, e.g., an E. coil or a Pseudomonas sp. 
5 polypeptide. 

A polypeptide of the present Invention may be a fungai polypeptide, and preferably a 
yeast polypeptide such as a Candida, Kluyveromyces, Neocallimastix, Pichia, Piromyces, 
Saccharomyces, Schizosacctiaromyces, or Yarrowia polypeptide; or more preferably a 
filamentous fungal polypeptide sucli as an Acremonium, Aspergillus, Chaetomium, 

10 Chaetomium, Gloeophyllum, Malbranclieae, Melanocarpus, Meripilus, Myceliophthora, 
Stilbelia, Thielavia, or Trictiophaea polypeptide. 

In an interesting embodiment, the polypeptide is a Saccharomyces carisbergensis, 
Sacchammyces cerevisiae, Saccharomyces diastaticus, Saccliaromyces douglasii, 
Sacchammyces l<luyveri, Sacctiaromyces norbensis or Sacctiaromyces oviformis 

15 polypeptide. 

In a preferred embodiment, the polypeptide is a Ctiaetomium ttiermophilum, 
Myceliophthora tliermophila, Acremonium thermophilum, Ttiielavia australlensis, Aspergllli. 
Thielavia microspore, Aspergillus tubingensis, Gloeophyllum trabeum, Meripilus giganteus, 
Trichophaea saccate, Stilbella annulate, or Malbrancheae cinnamomea polypeptide 

20 It will be understood that for the aforementioned species, the invention encompasses 

both the perfect and Imperfect states, and other taxonomic equivalents, e.g., anamorphs, 
regardless of the species name by which they are known. Those skilled in the art will readily 
recognize the identity of appropriate equivalents. 

Furthermore, such polypeptides may be identified and obtained from other sources 

25 Including microorganisms isolated from nature (e.g., soil, water, plants, animals, etc.) using 
the above-mentioned probes. Techniques for isolating microorganisms from natural habitats 
are well known in the art. The nucleotide sequence may then be derived by similariy 
screening a genomic or cDNA library of another microorganism. Once a nucleotide sequence 
encoding a polypeptide has been detected with the probe(s), the sequence may be isolated 

30 or cloned by utilizing techniques which are known to those of ordinary skill in the art (see, 
e.g., J. Sambrook, E.F. Fritsch, and T. Maniatus, 1989. Molecular Cloning, A Laboratory 
Manual, 2d edition, Cold Spring Harbor, New Yoric). 

Polypeptides encoded by nucleotide sequences of the present invention also include 
fused polypeptides or cleavable fusion polypeptides in which another polypeptide is fused at 

35 the N-tenminus or the C-temnlnus of the polypeptide or fragment thereof. A fused polypeptide 
is produced by fusing a nucleotide sequence (or a portion thereof) encoding another 
polypeptide to a nucleotide sequence (or a portion thereof) of the present invention. 

21 
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Techniques for producing fusion polypeptides are Icnown in the art, and include ligating the 
coding sequences encoding the polypeptides so that they are in firame and that expression of 
the fused polypeptide is under control of the same promoter(s) and tennlnator. 

5 Polynucleotides and Nucleotide Sequences 

The present invention also relates to polynucleotides having a nucleotide sequence 
which encodes for a polypeptide of the invention. In particular, the present invention relates to 
polynucleotides consisting of a nucleotide sequence which encodes for a polypeptide of fine 
invention. In a prefen^d embodiment, tiie nucleotide sequence is selected from the group 

10 consisting of SEQ ID NO:1, SEQ ID NO:3. SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ 
ID NO:13. SEQ ID NO:11, SEQ ID NO:13. SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19. 
SEQ ID NO:21, SEQ ID NO:23 and SEQ ID NO:25. The present invention also encompasses 
polynucleotides comprising, preferably consisting of, nucleotide sequences which encode a 
polypeptide consisting of an amino add sequence selected from the group consisting of SEQ 

15 ID NO:2, SEQ ID NO:4, SEQ ID NO:6. SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO: 12, SEQ 
ID NO:14. SEQ ID NO:16, SEQ ID N0:18. SEQ ID NO:20. SEQ ID NO:22, SEQ ID NO:24. 
and SEQ ID NO:26., which differ from a nucleotide sequence selected from the group 
consisting of SEQ ID NO:1, SEQ ID NO:3. SEQ ID NO:5, SEQ ID NO:7. SEQ ID NO:9, SEQ 
ID NO:13, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, 

20 SEQ ID NO:21 , SEQ ID NO:23 and SEQ ID NO:25 by virtue of the degeneracy of tiie genetic 
code. 

The present invention also relates to polynucleotides comprising, preferably consisting 
of, a subsequence of a nucleotide sequence selected from the group consisting of SEQ ID 
N0:1, SEQ ID NO:3, SEQ ID NO:5. SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO: 13, SEQ ID 

25 NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ 
ID NO:23 and SEQ ID NO:25 which encode fragments of an amino acid sequence selected 
from Vne group consisting of SEQ ID NO:2, SEQ ID NO:4. SEQ ID NO:6. SEQ ID N0:8, SEQ 
ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:18, SEQ ID NO:20, 
SEQ ID NO:22, SEQ ID NO:24, and SEQ ID NO:26. that have cellobiohydrolase II activity. A 

30 subsequence of a nucleotide sequence selected from the group consisting of SEQ ID NO:1 , 
SEQ ID NO:3, SEQ ID NO:5, SEQ ID N0.7, SEQ ID NO:9, SEQ ID NO:13. SEQ ID NO:11. 
SEQ ID NO:13, SEQ ID NO:15. SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID 
NO:23 and SEQ ID NO:25 is a nucleotide sequence encompassed by a sequence selected 
from the group consisting of SEQ ID NO:1, SEQ ID NO:3. SEQ ID NO:5, SEQ ID NO:7, SEQ 

35 ID NO:9, SEQ ID NO:13, SEQ ID NO:11. SEQ ID NO:13, SEQ ID NO:15. SEQ ID NO:17, 
SEQ ID NO:19, SEQ ID N0:21, SEQ ID NO:23 and SEQ ID NO:25 except that one or more 
nucleotides from the 5' and/or 3' end have been deleted. 
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The present Invention also relates to polynucleotides ha^'ng. preferably consisting of, a 
modified nucleotide sequence which comprises at least one modification in the mature 
polypeptide coding sequence selected finom the group consisting of SEQ ID NO:1, SEQ ID 
NO:3, SEQ ID NO:5. SEQ ID NO:7. SEQ ID NO:9. SEQ ID NO:13. SEQ ID NO:11. SEQ ID 
NO:13. SEQ ID NO:15. SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23 and 
SEQ ID NO:25, and where the modified nucleotide sequence encodes a polypeptide which 
consists of an amino add sequence selected from the group consisting of SEQ ID NO:2, 
SEQ ID NO:4, SEQ ID NO:6. SEQ ID NO:8. SEQ ID NO:10. SEQ ID NO:12, SEQ ID NO:14, 
SEQ ID NO:16, SEQ ID NO:18. SEQ ID NO:20. SEQ ID NO:22. SEQ ID NO:24. and SEQ ID 
NO:26. 

The techniques used to isolate or clone a nucleotide sequence encoding a polypeptide 
are known in the art and include isolation from genomic DMA, preparation from cDNA, or a 
combination tiiereof. The cloning of tiie nucleotide sequences of the present Invention from 
such genomic DNA can be effected, e.g., by using the well known polymerase chain reaction 
(PGR) or antibody screening of expression libraries to detect cloned DNA fragments with 
shared stmctural features. See, e.g., Innis et al., 1990, PGR; A Guide to Methods and 
Application. Academic Press, New Yort<. Ottier amplification procedures such as ligase chain 
reaction (LCR), ligated activated transcription (LAT) and nucleotide sequence-based 
amplification (NASBA) may be used. The nucleotide sequence may be cloned from a strain 
selected Irom a strain belonging to a genus selected from tiie group consisting of 
Ctiaetomium, Myceiiopttthora. Melanocarpus. Acremonium, Thielavia, Aspergillus, 
Gloeophyllum. Meripilus, Trichophaea, Stilbella and Ma/branc/ieae, or another or related 
organism and thus, for example, may be an allelic or species variant of the polypeptide 
encoding region of the nucleotide sequence. 

The nucleotide sequence may be obtained by standard cloning procedures used in 
genetic engineering to relocate ttie nucleotide sequence from Its natural location to a different 
site where it will be reproduced. The cloning procedures may Involve excision and isolation of 
a desired fragment comprising the nucleotide sequence encoding ttie polypeptide, insertion 
of tiie fragment into a vector molecule, and Incorporation of the recombinant vector Into a 
host cell where multiple copies or clones of the nucleotide sequence will be replicated. The 
nucleotide sequence may be of genomic, cDNA, RNA, semisyntiietic, synthetic origin, or any 
combinations thereof. 

The present invention also relates to a polynucleotide comprising, preferably consisting 
of, a nucleotide sequence which has a degree of Identity with a nucleotide sequence selected 
from the group consisting of: 

nucleotides 63 to 1493 of SEQ ID NO:1. 

nucleotides 1 to 246 of SEQ ID NO:3. 
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nucleotides 1 to 1272 of SEQ ID NO:3, 
nucleotides 1 to 417 of SEQ ID NO:5. 
nucleotides 1 to 306 of SEQ ID NO:7. 
nucleotides 1 to 432 of SEQ ID NO:9. 
nucleotides 1 to 297 of SEQ ID NO:1 1 . 
nucleotides 1 to 420 of SEQ ID NO:1 3. 
nucleotides 1 to 330 of SEQ ID NO: 15, 
nucleotides 1 to 1221 of SEQ ID NO:15. 
nucleotides 1 to 429 of SEQ ID NO: 17, 
nucleotides 1 to 213 of SEQ ID NO:19, 
nucleotides 43 to 701 of SEQ ID NO:21. 
nucleotides 21 to 1394 of SEQ ID NO:23. and 
nucleotides 41 to 1210 of SEQ ID NO:25. 

of at least 70% identity, sucli as at least 75% identity; preferably, the nucleotide sequence 
has at least 80% identity, e.g. at least 85% Identity, such as at least 90% identity, more 
preferably at least 95% Identity, such as at least 96% identity, e.g. at least 97% identity, even 
more preferably at least 98% identity, such as at least 99%. Preferably, the nucleotide 
sequence encodes a polypeptide having cellobiohydrolase II activity. The degree of identity 
between two nucleotide sequences is detemnined as described previously (see the section 
entitled "Definitions"). 

Modification of a nucleotide sequence encoding a polypeptide of the present invention 
may be necessary for the synthesis of a polypeptide, which comprises an amino acid 
sequence that has at least one substitution, deletion and/or insertion as compared to an 
amino add sequence selected from the group consisting of SEQ ID NO:2. SEQ ID NO:4, 
SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12. SEQ ID NO:24, SEQ ID 
NO:16, SEQ ID NO:18. SEQ ID NO:20. SEQ ID NO:22, SEQ ID NO:28 and SEQ ID NO:26. 
These artificial variants may differ In some engineered way from tiie polypeptide isolated 
from Its native source, e.g., variants ttiat differ in specific activity, thermostability or pH 
optimum. 

It will be apparent to those skilled In the art that such modifications can be made 
outside the regions critical to ttie function of the molecule and still result In an active 
polypeptide. Amino acid residues essential to the activity of tiie polypeptide encoded by tiie 
nucleotide sequence of tiie Invention, and tiierefore preferably not subject to modification, 
such as substitution, may be identified according to procedures known in the art, such as 
siteKlirected mutagenesis or aianine-scanning mutagenesis (see, e.g., Cunningham and 
Wells, 1989, Science 244: 1081-1085). In the latter technique, mutations are introduced at 
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every positively charged residue in the molecule, and the resultant mutant molecules are 
tested for cellobiohydrolase II activity to identify amino acid residues that are critical to the 
activity of the molecule. Sites of substrate-enzyme interaction can also be determined by 
analysis of the three-dimensional structure as determined by such techniques as nuclear 
5 magnetic resonance analysis, crystallography or photoaffinity labelling (see, e.g., de Vos et 
aL, 1992, Science 255: 306-312; Smith ef a/., 1992, Journal of Molecular Biology 224: 899- 
904; Wfodaverefa/., 1992, FEBS Letters 309: 59-64). 

Moreover, a nucleotide sequence encoding a polypeptide of the present Invention may 
be modified by introduction of nucleotide substitutions which do not give rise to another 
10 amino acid sequence of the polypeptide encoded by the nucleotide sequence, but which 
correspond to the codon usage of the host organism intended for production of the enzyme. 

The introduction of a mutation into the nucleotide sequence to exchange one nucleotide 
for another nucleotide may be accomplished by site-directed mutagenesis using any of the 
methods known In the art. Particularly useful is the procedure, which utilizes a supercolled, 
15 double stranded DNA vector with an insert of interest and two synthetic primers containing 
the desired mutation. The oligonucleotide primers, each complementary to opposite strands 
of the vector, extend during temperature cycling by means of Pfu DNA polymerase. On 
incorporation of the primers, a mutated plasmid containing staggered nicks is generated. 
Following temperature cycling, the product is treated with Dpnl which is specific for 
20 methylated and hemimethylated DNA to digest the parental DNA template and to select for 
mutation-containing synthesized DNA. Other procedures known in the art may also be used. 
For a general description of nucleotide substitution, see, e.g., Ford et a/., 1991, Protein 
Expression and Purification 2: 95-107. 

The present Invention also relates to a polynucleotide comprising, preferably consisting 
25 of, a nucleotide sequence which encodes a polypeptide having cellobiohydrolase II activity, 
and which hybridizes under very low stringency conditions, preferably under low stringency 
conditions, more preferably under medium stringency conditions, more preferably under 
medium-high stringency conditions, even more preferably under high stringency conditions, 
and most preferably under very high stringency conditions with a polynucleotide probe 
30 selected from the group consisting of 

(i) the complementary strand of the nucleotides selected from the group consisting of: 
nucleotides 63 to 1493 of SEQ ID NO:1 , 
nucleotides 1 to 246 of SEQ |D NO:3, 
nucleotides 1 to 1272 of SEQ ID NO:3, 
35 nucleotides 1 to 417 of SEQ ID NO:5, 

nucleotides 1 to 306 of SEQ ID N0:7, 
nucleotides 1 to 432 of SEQ ID NO:9, 
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nucleotides 1 to 297 of SEQ ID NO:1 1 . 
nucleotides 1 to 420 of SEQ ID NO:13. 
nucleotides 1 to 330 of SEQ ID NO:15. 
nucleotides 1 to 1221 of SEQ ID NO:15, 
nucleotides 1 to 429 of SEQ ID N0:17, 
nucleotides 1 to 213 of SEQ ID NO:19, 
nucleotides 43 to 701 of SEQ ID NO:21, 
nucleotides 21 to 1394 of SEQ ID NO:23, and 
nucleotides 41 to 1210 of SEQ ID NO:25. 

As will be understood, details and particulars concerning hybridization of ttie nucleotide 
sequences will be tiie same or analogous to the hybridization aspects discussed In the 
section entitled "Polypeptides Having Celloblohydrolase II Activity" herein. 

DNA recombination (shuffling) 

The nucleotide sequences of SEQ ID NO:1, SEQ ID NO:3. SEQ ID NO:5, SEQ ID 
NO:7, SEQ ID NO:9, SEQ ID NO:13. SEQ ID NO:11. SEQ ID NO:13, SEQ ID NO:15. SEQ 
ID NO:17, SEQ ID NO:19. SEQ ID NO:21, SEQ ID NO:23 and SEQ ID NO:25 may be used 
in a DNA recombination (or shuffling) process. The new polynucleotide sequences obtained 
in such a process may encode new polypeptides having ceiiobiase activity witii improved 
properties, such as improved stability (storage stability, ttiermostability), improved specific 
activity, improved pH-optimum, and/or improved tolerance towards specific compounds. 

Shuffling between two or more homologous input polynucleotides (starting-point 
polynucleotides) involves fragmenting ttie polynucleotides and recombinlng the fragments, to 
obtain output polynucleotides O-©- polynucleotides that have been subjected to a shuffling 
cycle) wherein a number of nudeotide fragments are exchanged in comparison to tiie input 
polynucleotides. 

DNA recombination or shuffling may be a (partially) random process in which a library 
of chimeric genes is generated firom two or more starting genes. A number of Icnown formats 
can be used to carry out this shuffling or recombination process. 

The process may involve random fragmentation of parental DNA followed by 
reassembly by PGR to newfuiWength genes, e.g. as presented in US5605793, US5811238, 
US5830721, US61 17679. In-vttro recombination of genes may be carried out, e.g. as 
described in US6159687, W098/41623, US6159688, US5965408, US6153510. The 
recombination process may take place in vivo in a living cell, e.g. as described in WO 
97/07205 and WO 98/28416. 

The parental DNA may be fragmented by DNA'se I treatment or by restriction 
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endonuclease digests as descriobed by Kikuchi et al (2000a, Gene 236:159-167), Shuffling 
of two parents may be done by shuffling single stranded parental DMA of the two parents as 
described in Kikuchi et al (2000b, Gene 243:133-137). 

A particular method of shuffling is to follow the methods described in Crameri et al, 
6 1998. Nature, 391: 288-291 and Ness et al. Nature Biotechnology 17: 893-896. Another 
fonmat would be the methods described in US 6159687: Examples 1 and 2. 

Nucleic Acid Constructs 

The present invention also relates to nucleic acid constructs comprising a nucleotide 

10 sequence of the present invention operably linked to one or more control sequences that 
direct the expression of the coding sequence in a suitable host cell under conditions 
compatible with the control sequences. 

A nucleotide sequence encoding a polypeptide of the present invention may be 
manipulated In a variety of ways to provide for expression of the polypeptide. Manipulation of 

15 the nucleotide sequence prior to its insertion into a vector may be desirable or necessary 
depending on the expression vector. The techniques for modifying nucleotide sequences 
utilizing recombinant DNA methods are well known in the art. 

The control sequence may be an appropriate promoter sequence, a nucleotide 
sequence which is recognized by a host cell for expression of the nucleotide sequence. The 

20 promoter sequence contains transcriptional control sequences, which mediate the expression 
of the polypeptide. The promoter may be any nucleotide sequence which shows 
transcriptional activity in the host cell of choice Including mutant, truncated, and hybrid 
promoters, and may be obtained from genes encoding extracellular or intracellular 
polypeptides either homologous or heterologous to the host cell. 

25 E)ramples of suitable promoters for directing the transcription of the nucleic acid 

constructs of the present invention, especially in a bacterial host cell, are the promoters 
obtained from the E. coli lac operon, Streptomyces coelicolor agarase gene {dagA), Bacillus 
subtllls levansucrase gene (sacS), Bacillus Hcheniformis alpha-amylase gene (amyL), Bacillus 
stearvtiiermophilus maltogenic amylase gene {amyM), Bacillus amyloliquefaciens alpha- 

30 amylase gene {amyQ), Bacillus Hcheniformis penicillinase gene ipenP), Bacillus subtllls xylA 
and xy/B genes, and prokaryotic beta-lactamase gene (Vllla-Kamaroff ef a/., 1978, 
Proceedings of the National Academy of Sciences USA 75: 3727-3731), as well as tiie tac 
promoter (DeBoer et al., 1983, Proceedings of the National Academy of Sciences USA 80: 
21-25). Further promoters are described in "Useful proteins from recombinant bacteria" in 

35 Scientific American, 1980, 242: 74-94; and in Sambrook ef a/., 1989, supra. 

Examples of suitable promoters for directing the transcription of the nucleic acid 
constructs of the present invention in a filamentous fungal host cell are promoters obtained 
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from the genes for Aspergillus oryzae TAKA amylase. Rhizomucor miehei asparUc 
proteinase, Aspergillus niger neutral alpha-amylase, Aspergillus niger add stable alpha- 
amylase, Aspergillus niger or Aspergillus awamorl glucoamylase iglaA), Rhizomucor miehei 
lipase, Aspergillus oryzae alkaline protease, Aspergillus oryzae triose phosphate isomerase, 
Aspergillus nidulans acetamidase, and Fusarium oxysporum trypsin-like protease (WO 
96/00787), as well as the NA2-tpi promoter (a hybrid of the promoters from the genes for 
Aspergillus niger neutral alpha-amylase and Aspergillus oryzae triose phosphate isomerase), 
and mutant, truncated, and hybrid promoters thereof. 

In a yeast host, useM promoters are obtained from the genes for Saccharomyces 
cerevlsiae enolase (ENO-1), Saccharomyces cerevlsiae galactoklnase (GAL1), 
Saccharomyces cerevlsiae alcohol dehydrogenase/glyceraIdehyde-3-phosphate 
dehydrogenase (ADH2/GAP), and Saccharomyces cerevlsiae 3-phosphoglycerate kinase. 
Other useful promoters for yeast host cells are described by Romanes ef al., 1992, Yeasf 8: 
423-488. 

The control sequence may also be a suitable transcription temnlnator sequence, a 
sequence recognized by a host ceil to tenninate transcription. The temiinator sequence is 
operably linked to the 3' tenninus of the nudeotide sequence encoding the polypeptide. Any 
temiinator which is fun<4ional In the host cell of chol<» may be used In the present Invention. 

Preferred tenminators for filamentous fungal host cells are obtained from the genes for 
Aspergillus oryzae TAKA amylase, Aspergillus niger glucoamylase, Aspergillus nidulans 
anthranilate synthase. Aspergillus n/igreralpha-glucosidase, and Fusarium oxysporum trypsin- 
llke protease. 

Prefenred terminators for yeast host cells are obtained from the genes for 
Sacchammyces cerevlsiae enolase, Saccharomyces cerevlsiae cytochrome C (CYC1), and 
Saccharomyces cerevlsiae glyceraldehyde-3-phosphate dehydrogenase. Other useful 
tenninators for yeast host cells are described by Romanes ef a/., 1992, supra. 

The control sequence may also be a suitable leader sequence, a nontranslated region 
of an mRNA which is important for translation by the host cell. The leader sequence is 
operably linked to the 5' tenninus of the nudeotide sequence encoding the polypeptide. Any 
leader sequence that is functional In the host cell of choice may be used in the present 
invention. 

Prefenred leaders for filamentous fungal host cells are obtained from the genes for 
Aspergillus oryzae TAKA amylase and Aspergillus nidulans triose phosphate Isomerase. 

Suitable leaders for yeast host cells are obtained from the genes for Saccharomyces 
cerevi^ae enolase (ENO-1), Saccharomyces cerevisiae 3-phosphoglycerate kinase, 
Saccharomyces cerevisiae alpha-factor, and Saccharomyces cerevisiae alcohol 
dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP). 
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The control sequence may also be a polyadenylation sequence, a sequence operably 
linked to the 3' temnlnus of the nucleotide sequence and which, when transcribed, is 
recognized by tiie host cell as a signal to add polyadenosine residues to transcribed mRNA. 
Any polyadenylation sequence which Is functional in the host cell of choice may be used in 

5 the present Invention, 

Preferred polyadenylation sequences for filamentous fungal host cells are obtained 
from the genes for Aspergillus oryzae TAKA amylase, Aspergillus niger glucoamylase, 
Aspergillus nidulans anthranilate synthase, Fusarium oxysporum trypsin-IIke protease, and 
Aspergillus niger alpha-glucosidase. 

10 Useful polyadenylation sequences for yeast host cells are described by Guo and 

Shennan, 1995, Molecular Ceiluiar Biology ^5: 5983-5990. 

The control sequence may also be a signal peptide coding region that codes for an 
amino acid sequence linked to the amino terminus of a polypeptide and directs the encoded 
polypeptide Into the cell's secretory pathway. The 5' end of the coding sequence of the 

15 nucleotide sequence may inherently contain a signal peptide coding region naturally linked In 
translation reading frame with the segment of the coding region which encodes the secreted 
polypeptide. Alternatively, the 5' end of the coding sequence may contain a signal peptide 
coding region which Is foreign to the coding sequence. The foreign signal peptide coding 
region may be required where the coding sequence does not naturally contain a signal 

20 peptide coding region. Alternatively, the foreign signal peptide coding region may simply 
replace the natural signal peptide coding region In order to enhance secretion of the 
polypeptide. However, any signal peptide coding region which directs the expressed 
polypeptide Into the secretory pathway of a host cell of choice may be used in the present 
invention. 

25 Effective signal peptide coding regions for bacterial host cells are the signal peptide 

coding regions obtained from the genes for Bacillus NCIB 11837 maltogenic amylase, 
Bacillus stearotfiermophilus alpha-amylase, Bacillus licheniformis subtilisin. Bacillus 
iichenlformis beta-lactamase, Bacillus stearotfiermophilus neutral proteases (nprT, nprS, 
npri\/l), and Bacillus subtilis prsA. Further signal peptides are described by Simonen and 

30 Palva, 1 993, Microbiological Reviews 57: 1 09-1 37. 

Effective signal peptide coding regions for filamentous fungal host cells are the signal 
peptide coding regions obtained from the genes for Aspergillus oryzae TAKA amylase, 
Aspergillus niger neutral amylase, Aspergillus niger glucoamylase, Rhizomucor miehel 
aspartic proteinase, Humicola insolens cellulase, and Humicola lanuginosa lipase. 

35 Useful signal peptides for yeast host cells are obtained from the genes for 

Saccharomyces cerevisiae alpha-factor and Sacctjaromyces cerevisiae invertase. Other 
useful signal peptide coding regions are described by Romanes etal., 1992, supra. 
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The control sequence may also be a propeptide coding region that codes for an amino 
acid sequence positioned at the amino terminus of a polypeptide. The resultant polypeptide is 
known as a proenzyme or propolypeptide (or a zymogen in some cases). A propolypeptide Is 
generally inactive and can be converted to a mature active polypeptide by catalytic or 
5 autocataiytic cleavage of the propeptide from the propolypeptide. The propeptide coding 
region may be obtained from the genes for Bacillus subtiUs alkaline protease (apr£). Bacillus 
subtilis neutral protease (nprT), Sacctiaromyces cerevisiae alpha-factor, Rhizomucor mietiel 
aspartic proteinase, and MyceliopMhora thermophila laccase (WO 95/33836). 

Where both signal peptide and propeptide regions are present at the amino tenminus of 
10 a polypeptide, the propeptide region is positioned next to the amino terminus of a polypeptide 
and the signal peptide region is positioned next to the amino terminus of the propeptide 
region. 

It may also be desirable to add regulatory sequences which allow the regulation of the 
expression of the polypeptide relative to the growth of the host ceil. Examples of regulatory 

15 systems are those which cause the expression of the gene to be tumed on or off in response 
to a chemical or physical stimulus, including the presence of a regulatory compound. 
Regulatory systems in prokaryotic systems Include the lac, fac, and trp operator systems. In 
yeast, the ADH2 system or GAL1 system may be used. In filamentous fungi, the TAKA alpha- 
amylase promoter, Aspergillus niger glucoamylase promoter, and Aspergillus oryzae 

20 glucoamylase promoter may be used as regulatory sequences. Other examples of regulatory 
sequences are those which allow for gene amplification. In eukaryotic systems, these include 
the dihydrofolate reductase gene which is amplified In the presence of methotrexate, and the 
metaliothionein genes which are amplified with heavy metals. In these cases, tiie nucleotide 
sequence encoding the polypeptide would be operably linked with the regulatory sequence. 

25 

Expression Vectors 

The present invention also relates to recombinant expression vectors comprising the 
nucleic acid construct of the invention. The various nucleotide and control sequences 
described above may be joined together to produce a recombinant expression vector which 
30 may Include one or more convenient restriction sites to allow for insertion or substitution of 
the nucleotide sequence encoding the polypeptide at such sites. Altematlvely, the nucleotide 
sequence of tiie present invention may be expressed by inserting the nucleotide sequence or 
a nucleic add construct comprising the sequence Into an appropriate vector for expression. 
In creating the expression vector, the coding sequence is located In the vector so that the 
35 coding sequence is operably linked with tine appropriate control sequences for expression. 

The recombinant expression vector may be any vector (e.g., a plasmid or virus) which 
can be conveniently subjected to recombinant DNA procedures and can bring about the 

30 



wo 2004/056981 PCT/DK2003/000914 

expression of the nucleotide sequence. The choice of the vector will typically depend on the 
compatibility of the vector witti tiie host cell Into which the vector is to be Introduced. The 
vectors may be linear or closed circular plasmlds. 

The vector may be an autonomously replicating vector, /.e., a vector which exists as an 
5 extrachromosomal entity, the replication of which is Independent of chromosomal replication, 
e.g., a plasmid, an extrachromosomal element, a minlchromosome, or an artificial 
chromosome. 

The vector may contain any means for assuring self-replication. Alternatively, the vector 
may be one which, when introduced into the host cell, is integrated into the genome and 

10 replicated together with the chromosome(s) into which it has been integrated. Furthermore, a 
single vector or plasmid or two or more vectors or plasmlds which together contain the total 
DNA to be introduced into tiie genome of the host cell, or a transposon may be used. 

The vectors of the present invention preferably contain one or more selectable markers 
which permit easy selection of transformed cells. A selectable marker is a gene the product 

15 of which provides for bloclde or viral resistance, resistance to heavy metals, prototrophy to 
auxotrophs, and the like. 

Examples of bacterial selectable markers are the del genes from Bacillus subtilis or 
Bacillus licheniformis, or markers which confer antibiotic resistance such as ampiciliin, 
kanamycin, chloramphenicol or tetira<ycline resistance. Suitable markers for yeast host cells 

20 are ADE2, HISS, LEU2, LYS2, MET3, TRP1, and URA3. Selectable markers for use in a 
filamentous fungal host cell Include, but are not limited to, amdS (acetamldase), argB 
(ornithine cariDamoyltransferase), fear (phosphinothricin acetyltransferase), hygB (hygromydn 
phosphotransferase), niaD (nitrate reductase), pyrG (orotidine-5 -phosphate decariDOxylase), 
sC (sulfate adenyltransferase), tipC (antiiranilate synthase), as well as equivalents tiiereof. 

25 Preferred for use in an Aspergillus cell are the amdS and pyrG genes of Aspergillus 

nidulans or Aspergillus oryzae and tifie bar gene of Streptomyces hygroscopicus. 

The vectors of the present Invention preferably contain an element(s) that penmits 
stable integration of the vector into the host cell's genome or autonomous replication of the 
vector in the cell independent of the genome. 

30 For Integration into the host cell genome, the vector may rely on the nucleotide 

sequence encoding the polypeptide or any other element of the vector for stable integration 
of the vector Into the genome by homologous or nonhomologous recombination. 
Alternatively, the vector may contain additional nucleotide sequences for directing integration 
by homologous recombination into the genome of the host cell. The additional nucleotide 

35 sequences enable the vector to be Integrated into the host cell genome at a precise 
location(s) in the chromosome(s). To Increase the likelihood of Integration at a precise 
location, the integrational elements should preferably contain a sufficient number of 
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nucleotides, such as 100 to 1,500 base pairs, preferably 400 to 1,500 base pairs, and most 
preferably 800 to 1,500 base pairs, which are highly homologous with the conresponding 
target sequence to enhance the probability of homologous recombination. The integrational 
elements may be any sequence that is homologous with the target sequence in the genome 

5 of the host cell. Furthermore, the integrational elements may be non-encoding or encoding 
nucleotide sequences. On the other hand, the vector may be integrated Into the genome of 
the host cell by non-homologous recombination. 

For autonomous replication, the vector may further comprise an origin of replication 
enabling the vector to replicate autonomously in the host cell in question. Examples of 

10 bacterial origins of replication are the origins of replication of plasmlds pBR322, pUC19, 
pACYC177, and pACYC184 pemiitting replication in E. co//, and pUBIIO, pE194, pTAIOBO, 
and pAMftI penmitting replication In Bacillus. Examples of origins of replication for use in a 
yeast host cell are the 2 micron origin of replication, ARS1, ARS4, the combination of ARS1 
and CEN3, and the combination of ARS4 and CEN6. The origin of replication may be one 

15 having a mutation which makes its functioning temperature-sensitive in the host cell (see, 
e.g., Ehriich, 1978, Proceedings of the National Academy of Sciences USA 75: 1433). 

More than one copy of a nucleotide sequence of the present invention may be inserted 
into the host cell to Increase production of tiie gene product. An increase in the copy number 
of tiie nucleotide sequence can be obtained by integrating at least one additional copy of the 

20 sequence Into the host cell genome or by including an amplifiable selectable marker gene 
with the nucleotide sequence where cells containing amplified copies of the selectable 
marker gene, and thereby additional copies of the nucleotide sequence, can be selected for 
by cultivating the cells in the presence of the appropriate selectable agent. 

The procedures used to ligate the elements described above to construct the 

25 recombinant expression vectors of the present invention are well known to one skilled in the 
art (see, e.g., Sambrook et a/., 1989, supra). 

Host Cells 

The present invention also relates to recombinant a host cell comprising the nucleic 
30 acid construct of the invention, which are advantageously used in the recombinant production 
of the polypeptides. A vector comprising a nucleotide sequence of the present invention is 
introduced into a host cell so that the vector is maintained as a chromosomal integrant or as 
a self-replicating exta^a-chromosomal vector as described eariier. 

The host cell may be a unicellular microorganism, e.g., a prokaryote, or a non- 
35 unicellular microorganism, e.g., a eukaryote. 

Useful unicellular cells are bacterial cells such as gram positive bacteria Including, but 
not limited to, a Bacillus cell, e.g., Bacillus all<alophilus. Bacillus amyloliquefaciens. Bacillus 

32 



wo 2004/056981 PCT/DK2003/000914 

brevis. Bacillus circulans. Bacillus clausll. Bacillus coagulans. Bacillus lautus. Bacillus lentus, 
Bacillus lichenlfomis. Bacillus megaterium. Bacillus stearotliermophllus, Bacillus subtills, and 
Bacillus thuringiensis] or a Streptomyces cell, e.g., Streptomyces IMdans or Streptomyces 
murinus, or gram negative bacten'a such as E. coll and Pseudomonas sp. In a preferred 
5 embodiment, tlie bacterial host cell is a Bacillus lentus. Bacillus licheniformis. Bacillus 
stearothermophllus, or Bacillus subtiiis cell. In another preferred embodiment, the Bacillus 
cell is an alkalophllic Bacillus. 

The introduction of a vector into a bacterial host ceil may, for instance, be effected by 
protoplast transformation (see, e.g., Chang and Cohen, 1979, Molecular General Genetics 
10 168: 111-115), using competent cells (see, e.g.. Young and Spizizin, 1961, Journal of 
Bacteriology 81: 823-829, or Dubnau and Davidoff-Abelson, 1971, Journal of Molecular 
Biology 56: 209-221), eleclroporation (see, e.g. , Shigekawa and Dower, 1988, Biotechniques 
6: 742-751), or conjugation (see, e.g., Koehler and Thome, 1987, Journal of Bacteriology 
169: 5771-5278). 

15 The host cell may be a eukaryote, such as a mammalian, insect, plant, or fungal cell. 

In a preferred embodiment, the host cell is a fungal cell. "Fungi" as used herein 
includes the phyla Ascomycota, Basidlomyoota, Chytridlomycota, and Zygomycota (as 
defined by Hawksworth ef a/.. In, Ainsworth and BIsby's Dictionary of The Fungi, 8th edition, 
1995, CAB International, University Press, Cambridge, UK) as well as the Oomycota (as cited 

20 in Hawksworth et aL, 1995, supra, page 171) and all mitosporic fungi (Hawksworth et al., 
1995, supra). 

In a more preferred embodiment, the fungal host cell is a yeast cell. "Yeasf as used 
herein includes ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and 
yeast belonging to the Fungi Imperfect! (Blastomycetes). Since the classification of yeast may 

25 change in the future, for the purposes of this invention, yeast shall be defined as described in 
Biology and Acuities of Yeast (Skinner, F.A., Passmore, S.M., and Davenport, R.R., eds, 
Soc. App. Bacteriol. Symposium Series No. 9, 1980). 

In an even more preferred embodiment, the yeast host cell is a Candida, Aschbyii, 
Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell. 

30 In a most preferred embodiment, the yeast host cell is a Saccharomyces 

carisbergensis, Saccharomyces cerevlsiae, Saccharomyces diastaticus, Saccharomyces 
douglasli, Saccharomyces kluyveri, Saccharomyces norbensis or Saccharomyces oviformis 
cell In another most preferred embodiment, the yeast host cell is a Kluyvemmyces lactis cell. 
In another most preferred embodiment, the yeast host cell is a Yarrowia lipolytica cell. 

35 In another more preferred embodiment, the fungal host cell Is a filamentous fungal cell. 

"Filamentous fungi" include all filamentous forms of the subdivision Eumycota and Oomycota 
(as defined by Hawksworth et al., 1995, supra). The filamentous fungi are characterized by a 
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mycelial wall composed of chltin. cellulose, glucan, chitosan, mannan, and other complex 
polysaccharides. Vegetative growth is by hyphal elongation and carbon catabolism is 
obligately aerobic. In contrast, vegetative growth by yeasts such as Sacc/taromyces 
cerevisiae is by budding of a unicellular thallus and carbon catabolism may be fermentative. 
5 In an even more preferred embodiment, the filamentous fungal host cell is a ceil of a 

species of, but not limited to, AcrBmonium, Aspergillus, Fusarium, Humicola, Mucor, 
Myceliophthora. Neurospora, PeniciUium, Thielavia, Tolypocladium, or Trichoderma. 

In a most preferred embodiment, the filamentous fungal host ceil is an Aspergillus 
awamori, Aspergillus foetldus, Aspergillus Japonlcus, Aspergillus nidulans, Aspergillus niger 

10 or Aspergillus oryzae cell. In another most prefenred embodiment, the filamentous fungal host 
cell is a Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium 
culmorum, Fusarium gramirtearum, Fusarium graminum, Fusarium t^eterosporum, Fusarium 
negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium 
sambucinum, Fusarium sarcoohroum, Fusarium sporotrichioldes, Fusarium sulphureum, 

15 Fusarium torulosum, Fusarium trichothedoides, or Fusarium venenatum cell. In an even 
most preferred embodiment, the filamentous fungal parent cell is a Fusarium venenatum 
(Nirenberg sp. nov.) cell. In another most preferred embodiment, the filamentous fungal host 
cell is a Humicola insolens, Humicola lanuginosa, Mucormiehei, Myceliophthora thermophila, 
Neurospora crassa, Penlcillium purpurogenum, Thielavia tenestris, Trichoderma harzianum, 

20 Trichodenna koningii, Trichoderma longibrachiatum, Trichoderma reesei, or Trichoderma 
viride cell. 

Fungal cells may be transformed by a process Involving protoplast fonnation, 
transformation of the protoplasts, and regeneration of the cell wall in a manner known perse. 
Suitable procedures for transformation of Aspergillus host cells are described In EP 238 023 

25 and Yelton et ai, 1984, Proceedings of the National Academy of Sciences USA 81 : 1470- 
1474. Suitable methods for transfonmlng Fusarium species are described by Malardler et aL, 
1989, Gene 78: 147-156 and WO 96/00787. Yeast may be transformed using the procedures 
described by Becker and Guarente, In Abelson, J.N. and Simon, M.I., editors, Guide to Yeast 
Genetics and Molecular Biology, Methods in Enzymology, Volume 194, pp 182-187, 

30 Academic Press, Inc., New York; Ito et al., 1983, Journal of Bacteriology 153: 163; and 
HInnen ef a/„ 1978, Proceedings of the National Academy of Sciences USA 75: 1920. 

Methods of Production 

The present invention also relates to methods for producing a polypeptide of the 
35 present invention comprising (a) cultivating a strain, which In its wild-type form Is capable of 
producing the polypeptide; and (b) recovering the polypeptide. Preferably, the strain is 
selected from a species within a genus comprised in the group consisting of Acremonium, 
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Aspergillus, Chaetomlum, Chaetomium, Gloeophyllum, Malbrancheae, Melanocarpus, 
Meripilus, Mycellophthora, Stilbella, Thielavia, or Trichophaea ; more preferably the strain is 
selected from the group consisting of Chaetomium thermophilum, Myceliophthora 
thermophila, Thietavia australlensis, Thielavia microspore, AspergiHus sp., the black 
5 Aspergilli, Aspergillus tubingensis syn. A. neotubingensis Frisvad sp.nov., Gloeophyllum 
trabeum, Meripilus giganteus, Trichophaea saccate, SUIbella annulate, and Malbrancheae 
cinnamomea. 

The present invention also relates to methods for producing a polypeptide of the 
present invention comprising (a) cultivating a host cell under conditions conducive for 

10 production of the polypeptide; and (b) recovering the polypeptide. 

The present invention also relates to methods for in-situ production of a polypeptide of 
the present invention comprising (a) cultivating a host cell under conditions conducive for 
production of the polypeptide; and (b) contacting the polypeptide with a desired substrate, 
such as a cellulosic substrate, without prior recovery of the polypeptide. The term "in-situ 

15 production" is intended to mean that the polypeptide is produced directly In the locus in which 
it is intended to be used, such as in a fermentation process for production of ethanol. 

In the production methods of the present invention, the cells are cultivated in a nutrient 
medium suitable for production of the polypeptide using methods known in the art. For 
example, the cell may be cultivated by shake flask cultivation, small-scale or large-scale 

20 fermentation (including continuous, batch, fed-batch, or solid state fermentations) in 
laboratory or industrial fermentors performed in a suitable medium and under conditions 
allowing the polypeptide to be expressed and/or isolated. The cultivation takes place in a 
suitable nutrient medium comprising carbon and nitrogen sources and inorganic salts, using 
procedures known in the art. Suitable media are available from commercial suppliers or may 

25 be prepared according to published compositions (e.g., in catalogues of the American Type 
Culture Collection). If the polypeptide is secreted into the nutrient medium, the polypeptide 
can be recovered directly from the medium. If the polypeptide is not secreted, it can be 
recovered from cell lysates. 

The polypeptides may be detected using methods known in the art that are specific for 

30 the polypeptides. These detection methods may include use of specific antibodies, formation 
of an enzyme product, or disappearance of an enzyme substrate. For example, an enzyme 
assay may be used to determine the activity of the polypeptide as described herein. 

The resulting polypeptide may be recovered by methods known in the art. For example, 
the polypeptide may be recovered from the nutrient medium by conventional procedures 

35 including, but not limited to, centrifugation, filtration, extraction, spray-drying, evaporation, or 
precipitation. 

The polypeptides of the present Invention may be purified by a variety of procedures 
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known in the art including, but not limited to, ciiromatography (e.g., ion exchange, affinity, 
hydrophobic, chromatofocusing, and size exclusion), electrophoretic procedures (e.g., 
preparative isoelectric focusing), differential solubility (e.g., ammonium sulfate precipitation), 
SDS-PAGE, or extraction (see, e.g.. Protein Purification, J--C. Janson and Lars Ryden, 
5 editors, VCH Publishers, New Yorlc. 1989). 



Plants 

The present invention also relates to a transgenic plant, plant part, or plant cell which 
has been transformed with a nucleotide sequence encoding a polypeptide having 

10 cellobiohydrolase II activity of the present invention so as to express and produce the 
polypeptide in recoverable quantities. The polypeptide may be recovered from the plant or 
plant part. Alternatively, tine plant or plant part containing the recombinant polypeptide may 
be used as such for improving the quality of a food or feed, e.g., Improving nutritional value, 
palatability, and rheological properties, or to destroy an antinutritive factor. 

15 The transgenic plant can be dicotyledonous (a dicot) or monocotyledonous (a 

monocot). Examples of monocot plants are grasses, such as meadow grass (blue grass, 
Poa), forage grass such as Festuca, Lollum, temperate grass, such as Agrostis, and cereals, 
e.g., wheat, oats, rye, bariey, rice, sorghum, millets, and maize (com). 

Examples of dicot plants are tobacco, lupins, potato, sugar beet, legumes, such as pea, 

20 bean and soybean, and cruciferous plants (family Brassicaceae), such as cauliflower, rape, 
canola, and tiie closely related model organism Ambidopsis thaliana. 

Examples of plant parts are stem, callus, leaves, root, fruits, seeds, and tubers. Also 
specific plant tissues, such as chloroplast, apoplast, mitochondria, vacuole, peroxisomes, and 
cytoplasm are considered to be a plant part. Furthermore, any plant cell, whatever the tissue 

25 origin. Is considered to be a plant part. 

Also included within the scope of the present invention are the progeny (clonal or seed) 
of such plants, plant parts and plant cells. 

The transgenic plant or plant cell expressing a polypeptide of the present Invention may 
be constructed in accordance with methods i<nown in the art. Briefly, the plant or plant cell Is 

30 constructed by incorporating one or more expression constructs encoding a polypeptide of 
the present Invention into the plant host genome and propagating the resulting modified plant 
or plant cell into a transgenic plant or plant cell. 

Conveniently, the expression construct Is a nucleic add construct which comprises a 
nucleotide sequence encoding a polypeptide of the prasent invention operably linked with 

35 appropriate regulatory sequences required for expression of tiie nucleotide sequence In the 
plant or plant part of choice. Furthemiore, the expression construct may comprise a 
selectable marker useful for Identifying host cells Into which the expression constiruct has 
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been integrated and DNA sequences necessary for introduction of the construct Into the plant 
In question (the latter depends on the DISJA Introduction method to be used). 

The choice of regulatory sequences, such as promoter and terminator sequences and 
optionally signal or transit sequences is determined, for e>^mple, on the basis of when, 
5 where, and how the polypeptide is desired to be expressed. For instance, the expression of 
the gene encoding a polypeptide of the present invention may be constitutive or inducible, or 
may be developmental, stage or tissue specific, and the gene product may be targeted to a 
specific tissue or plant part such as seeds or leaves. Regulatory sequences are, for example, 
described by Tague etal., 1988, Plant Physiology 86: 506. 

10 For constitutive expression, the 35S-CaMV promoter may be used (Franck et ai, 1980, 

Cell 21: 285-294). Organ-specific promoters may be, for example, a promoter from storage 
sink tissues such as seeds, potato tubers, and fruits (Edwards & Coruzzi, 1990, Ann. Rev. 
Genet 24: 275-303), or from metabolic sink tissues such as merlstems (Ito et ai, 1994, Plant 
MoL Biol. 24: 863-878), a seed specific promoter such as the glutelin, prolamin, globulin, or 

15 albumin promoter from rice (Wu et ai, 1998, Plant and Cell Physiology 39: 885-889), a Vicia 
faba promoter from the legumin B4 and the unknown seed protein gene from V7c/a faba 
(Conrad ef a/., 1998, Journal of Plant Physiology 152: 708-71 1), a promoter from a seed oil 
body protein (Chen ef a/., 1998, Plant and Cell Physiology 39: 935-941), the storage protein 
napA promoter from Brassica napus, or any other seed specific promoter known in the art, 

20 e.g., as described In WO 91/14772. Furthermore, the promoter may be a leaf specific 
promoter such as the rbcs promoter from rice or tomato (Kyozuka ef a/., 1993, Plant 
Physiology 102: 991-1000, the chlorelfa virus adenine methyltransferase gene promoter 
(Mitra and Higgins, 1 994, Plant Molecular Biology 26: 85-93), or the aldP gene promoter from 
rice (Kagaya et al., 1995, Molecular and General Genetics 248: 668-674), or a wound 

25 inducible promoter such as the potato pln2 promoter (Xu et al., 1993, Plant Molecular Biology 
22: 573-588). 

A promoter enhancer element may also be used to achieve higher expression of the 
enzyme in the plant. For instance, the promoter enhancer element may be an intron which Is 
placed between the promoter and the nucleotide sequence encoding a polypeptide of the 
30 present invention. For instance, Xu et ah, 1993, supra disclose the use of the first intron of 
the rice actin 1 gene to enhance expression. 

The selectable marker gene and any other parts of the expression construct may be 
chosen from those available in the art. 

The nucleic acid construct Is Incorporated into the plant genome according to 
35 conventional techniques known in the art, including Agrobacferium-mediated transformation, 
virus-mediated transformation, microinjection, particle bombardment, biolistic transformation, 
and electroporation (Gasser ef a/., 1990, Science 244: 1293; Potrykus, 1990, Bio/Technology 
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8: 535; Shimamoto ef a/., 1989, Nature 338: 274). 

Presently, Agrobacterium fufnefec/ens-medlated gene transfer is the method of choice 
for generating transgenic dicots (for a review, see Hooykas and Schilperoort, 1992, Plant 
Molecular Biology 19: 15-38). However it can also be used for transforming monocots, 
5 although other transformation methods are generally preferred for these plants. Presently, 
the method of choice for generating transgenic monocots is particle bombardment 
(microscopic gold or tungsten particles coated with the transforming DNA) of embryonic calli 
or developing embryos (Christou, 1992, P/an* Journal 2: 275-281; Shimamoto, 1994, Current 
Opinion Biotechnology 5: 158-162; Vasil et al., 1992, Bio/Teclinology 10: 667-674). An 
10 alternative method for transformation of monocots is based on protoplast transfomiation as 
described by Omiruileh etai, 1993, Plant Molecular Biology 2V, 415-428. 

Following transformation, the transfonmants having incorporated therein the expression 
construct are selected and regenerated Into whole plants according to methods well-known in 
the art. 

15 The present invention also relates to methods for producing a polypeptide of the 

present invention comprising (a) cultivating a transgenic plant or a plant cell comprising a 
nucleotide sequence encoding a polypeptide having cellobiohydrolase 11 activity of the 
present invention under conditions conducive for production of the polypeptide; and (b) 
recovering the polypeptide. 

20 The present invention also relates to methods for in-situ production of a polypeptide of 

the present invention comprising (a) cultivating a transgenic plant or a plant cell comprising a 
nucleotide sequence encoding a polypeptide having cellobiohydrolase II activity of the 
present invention under conditions conducive for production of the polypeptide; and (b) 
contacting the polypeptide with a desired substrate, such as a cellulosic substrate, without 

25 prior recovery of the polypeptide. 

Compositions 

In a still further aspect, the present invention relates to compositions comprising a 
polypeptide of the present invention. 

30 The composition may comprise a polypeptide of the invention as the major enzymatic 

component, e.g., a mono-component composition. Alternatively, the composition may 
comprise multiple enzymatic activities, such as an aminopeptidase, amylase, carbohydrase, 
carboxypeptidase, catalase, cellulase, chrtinase, cutinase, c^dodextrin glycosyltransferase, 
deoxyribonuclease, esterase, alpha-galactosldase, beta-galactosidase, glucoamylase, alpha- 

35 glucosidase, beta-glucosidase, haloperoxidase, invertase, iaccase, lipase, mannosldase, 
oxidase, pectinoiytic enzyme, peptidoglutaminase, peroxidase, phytase, polyphenoloxidase, 
proteolytic enzyme, ribonuclease, transglutaminase, or xyianase. 
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The compositions may be prepared in accordance witfi methods Icnown in the art and 
may be in the form of a liquid or a dry composition. For instance, the polypeptide composition 
may be In the fonm of a granulate or a microgranulate. The polypeptide to be Included In the 
composition may be stabilized in accordance with methods known in the art. 
5 Examples are given below of preferred uses of the polypeptide compositions of the 

invention. The dosage of the polypeptide composition of the invention and other conditions 
under which the composition is used may be determined on the basis of methods known In 
the art. 



1 0 Detergent Compositions 

The polypeptide of the Invention may be added to and thus become a component of a 

detergent composition. 

The detergent composition of the Invention may for example be formulated as a hand 

or machine laundry detergent composition Including a laundry additive composition suitable 
15 for pre-treatment of stained fabrics and a rinse added fabric softener composition, or be 

formulated as a detergent composition for use in general household hard surface cleaning 

operations, or be formulated for hand or machine dishwashing operations. 

In a specific aspect, the Invention provides a detergent additive comprising the 

polypeptide of the invention. The detergent additive as well as the detergent composition may 
20 comprise one or more other enzymes such as a protease, a lipase, a cutinase, an amylase, a 

carbohydrase, a cellulase, a pectinase, a mannanase, an arabinase, a galactanase, a 

xylanase, an oxidase, e.g., a laccase, and/or a peroxidase. 

In general the properties of the chosen enzyme(s) should be compatible with the 

selected detergent, (I.e. pH-optimum. compatibility wJtii other enzymatic and non-enzymatic 
25 ingredients, etc.), and the enzyme(s) should be present In effective amounts. 

Proteases : Suitable proteases include those of animal, vegetable or microbial origin. 

Microbial origin is prefenred. Chemically modified or protein engineered mutants are Included. 

The protease may be a serine protease or a metallo protease, preferably an alkaline 

microbial protease or a trypsin-like protease. Examples of alkaline proteases are subtilisins, 
30 especially those derived from Bacillus, e.g., subtilisin Novo, subtillsin Carlsberg, subtilisin 

309, subtilisin 147 and subtilisin 168 (described In WO 89/06279). Examples of trypsln-like 

proteases are trypsin (e.g. of porcine or bovine origin) and the Fusarium protease described 

In WO 89/06270 and WO 94/25583. 

Examples of useful proteases are the variants described in WO 92/19729, WO 
35 98/201 15, WO 98/201 16, and WO 98/34946, especially the variants witii substitutions In one 

or more of the following positions: 27, 36, 57, 76, 87, 97. 101, 104. 120. 123. 167. 170, 194, 

206, 218, 222, 224, 235 and 274. 
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Upases : Suitable lipases indude those of bacterial or fungai origin. Chemically modified or 
protein engineered mutants are included. Examples of useful lipases include lipases from 
Humicola (synonym Thermomyces), e.g. from H. lanuginosa (T. lanuginosus) as described in 
EP 258 068 and EP 305 216 or from H. insolens as described in WO 96/13580, a 

5 Pseudomonas lipase, e.g. from P. alcaligenes or P. pseudoalcaligenes (EP 218 272), P. 
cepacia (EP 331 376), P. stutzeri (GB 1,372,034), P. fluorescens, Pseudomonas sp. strain 
SD 705 (WO 95/06720 and WO 96/27002), R wisconsinensis (WO 96/12012), a Bacillus 
lipase, e.g. from 8. subtilis (Dartois et al. (1993), Biochemica et Biophysica Acta, 1131, 253- 
360), a steamthermoptiiius (JP 64/744992) or a pumilus (WO 91/16422). 

10 Other examples are lipase variants such as those described in WO 92/05249, WO 

94/01541, EP 407 225, EP 260 105, WO 95/35381, WO 96/00292, WO 95/30744, WO 
94/25578, WO 95/14783, WO 95/22615. WO 97/04079 and WO 97/07202. 
Amylases: Suitable amylases (alpha and/or beta) include those of bacterial or fungal origin. 
Chemically modified or protein engineered mutants are included. Amylases include, for 

15 example, alpha-amylases obtained from Bacillus, e.g. a special strain of fi. lichenifonvis, 
described in more detail in GB 1,296,839. 

Examples of useful amylases are the variants described in WO 94/02597, WO 
94/18314, WO 96/23873, and WO 97/43424, especially the variants with substitutions in one 
or more of the follo>Artng positions: 15, 23, 105, 106, 124, 128, 133. 154, 156, 181, 188, 190, 

20 197, 202, 208, 209, 243, 264, 304, 305, 391 , 408, and 444. 

Cellulases : Suitable cellulases include those of bacterial or fungal origin. Chemically modified 
or protein engineered mutants are included. Suitable cellulases include cellulases from the 
genera Bacillus, Pseudomonas, Humicola, Fusarium, Thielavia, Acremonlum, e.g. the fungal 
cellulases produced from Humicola inso/ens, Myceliophthora thennophila and Fusarium 

25 oxyspomm disclosed in US 4,435,307, US 5,648,263, US 5,691,178, US 5,776,757 and WO 
89/09259. 

Especially suitable cellulases are the alkaline or neutral cellulases having colour care 
benefits. Examples of such cellulases are cellulases described in EP 0 495 257, EP 0 531 
372, WO 96/11262, WO 96/29397, WO 98/08940. Other examples are ceilulase variants 
30 such as those described in WO 94/07998, EP 0 531 315, US 5,457,046, US 5,686,593, US 
5,763,254, WO 95/24471 , WO 98/12307 and PCT/DK98/00299. 

Peroxidases/Oxidases: Suitable peroxidases/oxidases include those of plant, bacterial or 
fungal origin. Chemically modified or protein engineered mutants are Included. Examples of 
useful peroxidases include peroxidases from Coprinus, e.g. from C. cinereus, and variants 
35 thereof as those described In WO 93/2461 8, WO 95/1 0602, and WO 98/1 5257. 

The detergent enzyme(s) may be Included In a detergent composition by adding 
separate additives containing one or more enzymes, or by adding a combined additive 
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comprising all of these enzymes. A detergent additive of the Invention, i.e. a separate 
additive or a combined additive, can be fonnulated e.g. as a granulate, a liquid, a slurry, etc. 
Preferred detergent additive fonmulations are granulates, in particular non-dusting granulates, 
liquids, in particular stabilized liquids, or slumes. 
5 Non-dusting granulates may be produced, e.g., as disclosed in US 4,106,991 and 

4,661,452 and may optionally be coated by methods known in the art. Examples of waxy 
coating materials are poly(ethylene oxide) products (polyethyleneglycol, PEG) with mean 
molar weights of 1000 to 20000; ethoxylated nonylphenols having from 16 to 50 ethylene 
oxide units; ethoxylated fatty alcohols in which the alcohol contains from 12 to 20 carbon 
10 atoms and in which there are 15 to 80 ethylene o)dde units; fatty alcohols; fatty adds; and 
mono- and di- and triglycerides of fatty acids. Examples of film-forming coating materials 
suitable for application by fluid bed techniques are given in GB 1483591. Liquid enzyme pre- 
parations may. for instance, be stabilized by adding a polyol such as propylene glycol, a 
sugar or sugar alcohol, lactic acid or boric acid according to established methods. Protected 
15 enzymes may be prepared according to the method disclosed In EP 238,216. 

The detergent composition of the invention may be in any convenient form, e.g., a bar, 
a tablet, a powder, a granule, a paste or a liquid. A liquid detergent may be aqueous, typically 
containing up to 70 % water and 0-30 % organic solvent, or non-aqueous. 

The detergent composition comprises one or more surfactants, which may be non-ionic 
20 including semi-polar and/or anionic and/or cationic and/or zwitterionic. The surfactants are 
typically present at a level of from 0.1% to 60% by weight. 

When included therein the detergent will usually contain from about 1% to about 40% 
of an anionic surfactant such as linear alkylbenzenesulfonate, aipha-olefinsulfonate, alkyi 
sulfate (fatty alcohol sulfate), alcohol ethoxysulfate, secondary alkanesulfonate, alpha-sulfo 
25 fatty acid methyl ester, alkyI- or alkenylsuccinic acid or soap. 

When included therein the detergent will usually contain from about 0.2% to about 40% 
of a non-ionic surfactant such as alcohol ethoxylate, nonylphenol ethoxylate, 
alkylpolyglycoside, alkyldlmethylamineoxide, ettioxylated fatty acid monoethanolamlde, fatty 
acid monoethanolamlde, polyhydroxy alkyI fatty acid amide, or N-acyl N-alkyI derivatives of 
30 glucosamine fglucamides"). 

The detergent may contain 0-65 % of a detergent builder or complexing agent such as 
zeolite, diphosphate, triphosphate, phosphonate, carbonate, citrate, nitrilotriacetic acid, 
ethylenediaminetetraacetic acid, diethylenetriaminepentaacetic acid, alkyI- or alkenylsucdnic 
acid, soluble silicates or layered silicates (e.g. SKS-6 from Hoechst). 
35 The detergent may comprise one or more polymers. Examples are 

cariDOxymethylcellulose, poly(vinylpyrrolidone), poly (ethylene glycol), poly(vinyl alcohol), 
poly(vinylpyridine-N-oxide), poly(vinylimidazole), polycarboxylates such as polyacrylates, 
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maleic/acrylic acid copolymers and lauryl methacrylate/acryllc acid copolymers. 

The detergent may contain a bleaching system which may comprise a H2O2 source 
such as perborate or percarbonate which may be combined with a peracid-fonming bleach 
activator such as tetraacetylethylenediamine or nonanoyloxybenzenesulfonate. Alternatively, 

5 the bleaching system may comprise peroxyacids of e.g. the amide, imlde, or sulfone type. 

The enzyme(s) of the detergent composition of the invention may be stabilized using 
conventional stabilizing agents, e.g., a polyol such as propylene glycol or glycerol, a sugar or 
sugar alcohol, lactic acid, boric acid, or a boric acid derivative, e.g., an aromatic borate ester, 
or a phenyl boronic acid derivative such as 4-formyiphenyl boronic acid, and the composition 

10 may be fomiulated as described in e.g. WO 92/1 970g and WO 92/19708. 

The detergent may also contain other conventional detergent ingredients such as e.g. 
fabric conditioners including clays, foam boosters, suds suppressors, anti-conroslon agents, 
soil-suspending agents, anti-soil redeposition agents, dyes, bactericides, optical brighteners, 
hydrotropes, tarnish inhibitors, or perfumes. 

15 It is at present contemplated that in the detergent compositions any enzyme, in 

particular the polypeptide of the invention, may be added in an amount corresponding to 
0.01-100 mg of enzyme protein per liter of wash liquor, preferably 0.05-5 mg of enzyme 
protein per liter of wash liquor, in particular 0.1-1 mg of enzyme protein per liter of wash 
liquor. 

20 The polypeptide of the invention may additionally be incorporated in the detergent 

formulations disclosed in WO 97/07202 which Is hereby Incorporated as reference. 



Production of Ethanol from Biomass 

The present invention also relates to methods for producing ethanol from biomass, 
25 such as cellulosic materials, comprising contacting the biomass with the polypeptides of the 
invention. Ethanol may subsequently be recovered. The polypeptides of the invention may be 
produced "in-situ", i.e., as part of, or directly In an ethanol production process, by cultivating a 
host cell or a strain, which in its wild-type forni is capable of producing the polypeptides, 
under conditions conducive for production of the polypeptides. 
30 Ethanol can be produced by enzymatic degradation of biomass and conversion of the 

released polysaccharides to ethanol. This kind of ethanol is often refen"ed to as bloethanol or 
blofuel. It can be used as a fuel additive or extender In blends of from less than 1% and up to 
100% (a fuel substitute). In some countries, such as Brazil, ethanol is substituting gasoline to 
a very large extent 

35 The predominant polysaccharide In the primary cell wall of biomass is cellulose, the 

second most abundant Is hemi-cellulose, and the third is pectin. The secondary ceil wall, 
produced after the cell has stopped growing, also contains polysaccharides and is 
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Strengthened through polymeric lignin covatently cross-linked to hemlcellulose. Cellulose is a 
homopolymer of anhydrocellobiose and thus a linear beta-(1-4)-D-glucan, while 
hemicelluloses include a variety of compounds, such as xylans, xyloglucans, arabino)Vlans, 
and mannans In complex branched structures with a spectrum of substituents. Although 
generally polymorphous, cellulose is found in plant tissue primarily as an insoluble crystalline 
matrix of parallel glucan chains. Hemicelluloses usually hydrogen bond to cellulose, as well 
as to other hemicelluloses, which helps stabilize the cell wall matrix. 

Three major dasses of cellulase enzymes are used to breakdown biomass: 

• The "endo-1 ,4-beta-glucanases" or 1 .4-beta-D-glucan-4-glucanohydrolases (EC 3.2. 1 .4), 
which act randomly on soluble and insoluble 1,4-beta-glucan substrates. 

• The "exo-1.4-beta-D-glucanases" Including both the 1 ,4-beta-D-glucan glucohydrolases 
(EC 3.2.1.74), which liberate D-glucose from 1 ,4-beta-D-glucans and hydrolyze D- 
celloblose slowly, and 1 ,4-beta-D-glucan cellobiohydrolase (EC 3.2.1.91), also refenred to 
as cellobiohydrolase I and II, which liberates D-cellobiose from 1,4-beta-glucans. 

• The "beta-D-glucosldases" or beta-D-glucoside glucohydrolases (EC 3.2.1.21), which act 
to release D-glucose units from cellobiose and soluble cellodextrins, as well as an anray 
of glycosides. 

These three classes of enzymes wori< together synerglstically In a complex Interplay 
that results in efficient decrystallization and hydrolysis of native cellulose from biomass to 
yield the reducing sugars which are converted to ethanol by fermentation. 

The present invention is further described by the following examples which should not 
be construed as limiting the scope of the invention, . 

EXAMPLES 

Chemicals used as buffers and substrates were commercial products of at least 
reagent grade. 

Example 1 

Molecular screening of cellobiohydrase II from thermophilic fungi 

The fungal strains were grown in 80 ml liquid media (2.5% Avicel. 0.5% Glucose. 0.14 
% (NH4)2S04) in 500 ml Erienmeyer flasks. The flasks were incubated for 72 hours at 45" C 
on a rotary shaker at 165 rpm. Mycelium was harvested by centrifugation at 7000 rpm for 30 
minutes and stored at-SCC before use for RNA extraction. 

Total RNA was extracted from 100 mg mycelium of each strain using the RNeasy Mini Kit 
(QIAGEN. Cat.No.74904). 

Degenerate primers were designed based on alignment of already known CBHIl protein 
sequences. The following primers were designed (see also SEQ ID NO:27 to 32). 
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SEQ ID NO:27 
SEQ ID NO-.28 
SEQ ID NO:29 
SEQ ID NO:30 
SEQ ID NO:31 
SEQ ID NO:32 



CBHIl 1S: 5' TGG GGN CA(A/G) TG(T/C) GGN GG 3' 

CBHIl 2S: 5' TGG (T/C)TN GGN TGG CCN GC 3* 

CBHIl 2AS: 5* GCN GGC CAN CCN A(A/G)C CA 3'(reverse) 

CBHIl 3AS: 5' TT(A/G) CAC CA(A/G) TCN CCC CA 3' (reverse) 

CBHIl 4AS: 5' GG(T/C) TTN ACC CAN AC{A/G) AA 3' (reverse) 

CBHIl 5AS: 5' AA(A/G) TAN GC(T/C) TG(A/G) AAC CA 3' (reverse) 



The 3' RACE system (GIBCO., Cat. No. 18373-01 9) were used to synthesize cDNA from 
total RNA. About 5 microgram total RNA was used as template and Adapter Primer (provided 
by 3'RACE system) was used to synthesize the first strand of cDNA. Then cDNA was 
amplified by using different combinations of degenerate primers. The reaction mixture 
comprised 2.5 microti Ox PCR buffer, 1.5 microL 25mM MgCI2.1.5 mIcroL 25mM MgCI2. 0.5 
mIcroL lOmlVI dNTP mix, 0.5 microL. 10 mIcroM 3'Primer, 0.5 microL AUAP (10 microM. 
provided by 3'RACE system), 0.5 microL TaqDNA polymerase (5u/ microL, Promega). 1 
microL cDNA synthesis reaction and autodaved. distilled water to 25 microL. 
PCR was performed under the following conditions: The reaction was submitted to 94'>C for 3 
minutes followed by 30 cycles of 94''C for 30 sec, 50"»C for 30 sec and extension at 72°C for 1 
minute. A final extension step at 72°C for 10 minutes followed by a 4 "C hold step completed 
the program. 

PCR products of the right size for each pair of primer were recoviered from 1% agarose 
(1 X TBE) gel, then purified by incubation in a BO^C water bath followed by purification using 
GFXTMPCR DNA and Gel Band Purification Kit (Amersham Phamnacla Biotech Inc., Cat. 
No. 27-9602-01). The concentrations of purified products were determined by measuring the 
absorbance of A260 and A280 In a spectrophotometer. Then these purified fragments were 
iigated to pGEM-T Vector (Promega, CatNo.A3600) according to kit from Promega 
(CatNo.A3600). 

Using the "heat shock" method 1 microL ligation products were transformed into 50 
microt JM109 high efficiency competent cells. Transformation cultures were plated onto tB 
plates with ampidliin/IPTG/ X-Gal, and plates were incubated overnight at 37''C. 
Recombinant clones were Identified by color screening on indicator plates and colony PCR 
screening. The positive clones were Inoculated into 3ml tB liquid medium and incubate 
overnight at 37"C on a rotary shaker at 250rpm. The cells were pelleted by oentrifugation for 
5min at lO.OOOxg and plasmid sample were prepared fipom the cell pellet by using Minipreps 
DNA Purification System (Promega, Cat.No.A7100). Finally the plasmids were sequenced 
with BlgDye Temilnator Cycle Sequencing Ready Reaction Kit (PE) by using ABI377 
sequencer. The sequencing reaction was as follows: 4 microt Tenmlnator Ready Reaction 
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Mix. 1.0-1.5 microgram Plasmid DNA, 3.2 pmol Primer and dH20 to a final volume of 10 
microL. 

Sequence analysis of the cDNA clones from different primer pairs showed that the 
sequences contain coding regions of CBHIl gene. The primers were successfully used for 

5 molecular screening of CBHIl gene from all tested fungal species within Chaetomium 
thermophilum, Myceliophtora thermophila, Acremonium thermophilum, Melanocarpus sp., 
Thielavia microspore, Aspergillus sp., Thielavia australierisis, Aspergillus tubingensis, 
Gloeophyllum trabeum, Meripilus giganteus, Trictiopfiaea saccate, Stibella anualata and 
Malbrancheae cinnamonea. The identified CBH II encoding DNA sequences are shown as 

10 SEQ ID NO:1. SEQ ID NO:3, SEQ ID NO:5. SEQ ID NO:7, SEQ ID NO:9. SEQ ID NO:13, 
SEQ ID NO:11, SEQ ID NO:13. SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID 
NO:21, SEQ ID NO:23 and SEQ ID NO:25. Full-length sequences were obtained from 
Aspergillus tubingensis, Chaetomium thermophllum, Myceliophtora thermophila Tnchophaea 
saccate, Stibella anualata, and Malbrar^cheae cinnamonea. From Acremonium thermophilum, 

15 Melanocarpus sp., Thielavia microspore, Aspergillus sp., Thielavia australiensis, 
Gloeophyllum trabeum and Meripilus giganteus only partial sequences have been obtained. 

Alternatively to the method applied above, the cDNA library could be screened for the 
full-length cDNA using standard hybridization techniques and the partial cDNA sequence as a 
probe. The clones giving a positive hybridization signal with the probe are then purified and 

20 sequenced to determine the longest cDNA sequence. Homology search and comparison 
confimis that the full-length cDNA correspond to the partial CBH 11 cDNA sequence that was 
originally used as a probe. 

The two approaches described above rely on the presence of the full-length CBH II 
cDNA In the cDNA library or In the cDNAs used for Its construction. Alternatively, the 5' and 3' 

25 RACE (Rapid Amplification of cDNA Ends) techniques or derived techniques could be used 
to Identify the missing 5' and 3' regions. For this purpose, mRNAs from are isolated and 
utilized to synthesize first strand cDNAs using oligo{dT)- containing Adapter Primer or a 5 - 
Gene Specific Primer (GSP). 

The full-length cDNA of the CBH II gene can also be obtained by using genomic DNA. 

30 The CBH II gene can be Identified by PCR techniques such as the one describe above or by 
standard genomic library screening using hybridization techniques and the partial CBH II 
cDNA as a probe. Homology search and comparison with the partial CBH II cDNA is used to 
that the genomic sequence correspond to the CBH II gene. Identification of consensus 
sequences such as initiation site of transcription, start and stop codons or polyA sites could 

35 be used to define the region comprising the full-length cDNA. Primers constructed from both 
the 5* and 3' ends of this region could then be used to amplify the full-length cDNA from 
mRNA or cDNA library (see above). 
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By expression of the full-length gene In a suitable expression host construct the CBH II 
enzyme is harvested as an intra cellular or extra cellular enzyme from the culture broth. 



EXAMPLE 2 

Using Blast the protein sequences were compared to SWall, ERDBP, and GenSeqP. if 
the sequenced was full length, the catalytic core was predicted using PFAMM HMM and only 
that core region was used to search the databases. The highest hit to the public datat>ases 
are listed except where the sequence is a duplicate to a sequence already present in the 
ERDB. 

Chaetomium thermopNIum NP000980 has 83% identity to the Humicola insolens 
avicelase II Glycosyl hydrolase in SWALL:Q9C1S9, family 6 domain 

Myceliophtora thermophila NP001130 has 79% protein identity to the H. insolens NCE2 
in geneseQDlaaw44827laaw44827 . 

Acremonium sp. T178-4 NP001132 has 74% protein identity to the Acremonium 
ceHulolyticus ceiiulase aeneseaplaaw25789laaw25789 . Glycosyl hydrolase family 6 domain. 

Melanocarpus sp. AT181-3 NP001133 has 91% protein identity to the H. insolens 
Cel6A fungal ceiiulase In aeneseaDlaaY01077laav01 077 . Glycosyl hydrolase family 6 domain 

Thielavia micmspora T046-1 NP001134 has 79% protein identity to the H. insolens 
ceiiulase NC2 in aenesea Plaaw44827iaaw44827 . Glycosyl hydrolase family 6 domain. 

Aspergillus sp. T186-2 NP001136 has 71% protein identity to the exoceiiobiohydroiase 
in swallla02321 laQ2321 Ptianemchaete chrysosporium. Glycosyl hydrolase family 6 domain 

Thielavia australiensis NP001000 has 77% protein identity to the H. insolens ceiiulase 
NC2 protein in aeneseQplaaw44827laaw44827 . Glycosyl hydrolase family 6 domain 

Aspetgillus tubingensis NP001143 has 67% protein identity to the CBHIl in 
SWALL:Q8NIB5 Talaromyces emersonii. The DNA sequence entry is 94% identical to 
NP001144 Gloeophyllum trabeum. Glycosyl hydrolase family 6 domain . 

Gloeopliyilum trabeum NP001144 67% protein identity to the CBHIl SWALL:Q8NIB5 
Talaromyces emersonii. The DIMA sequence entry is 94% Identical to NP001144 
Gloeophyllum trabeum 

Example 3 

Sequencing of the Malbranchia cinnamonea CBH 11 gene and the Stilbella anulate CBH II 
gene 

The cDNA inserts in plasmids Clone ZY043193, a cDNA encoding the Malbranchia 
cinnamonea CBH II, and clone 2Y040206, a cDNA encoding the Stilbella anulata CBH II, were 
sequenced to phred quality values > 40, indicating high confidence DNA sequence data. DNA 
sequencing was perfonn on an ABI 3700 (ABI, Foster City, CA) according to manufacturer's 
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protocols. Assembly of sequence data was performed using phred/phrap/consed (Universrty of 
Washington). 



Example 4 

5 Construction of expression plasmlds for the MalbranchtB cinnamonea CBH 11 gene and 
the StUbelia anulata CBH II gene 

The done and the nucleotide sequences of the Malbranchia cinnamonea CBH II gene 
described above are used for subcloning of the gene and expression in Aspergillus host. 
Polymerase chain reaction approach is used to subclone tiie CBHIl gene (without its own 

10 promoter) from tiie isolated cDNA clone ZY043193 using primers designed from the nucleotide 
sequences. In order to fadlitate the subdoning of tiie gene fragment into tiie pAILo 2 
expression vector, BspHI and Pac I restriction enzyme sites, respectively, at tiie 5' and 3* end of 
tiie gene, are introduced. The vector pAlLo 2 contains the TAKA promoter, NA2-^i leader and 
AMG terminator as regulatory sequences. The plasmid also contains Aspergillus nidulans pyrG 

1 5 gene as a selectable marker for fungal transformations. The following primers are used for PGR 
amplification process: 

Primer F4 (fonward): 5' GGGTGATGAGAGACTGTTTGTTCAG 3' (SEQ ID NO:33) 

Primer R4 (reverse): 5' GGGTTAATTAATTAGAATGGGGGGTTGGCATTTG 3' (SEQ ID 

NO:34) 

20 PGR is performed using Pwo polymerase (Boehringer Mannheim) according to 

manufacturer's specifications. The PGR amplified product Is gel isolated and cut with BspH I 
and Pac I enzymes and gel purified. The purified fragment is llgated to a pAILo 2 vector 
(already cut with Nco I and Pac I) to get the plasmid pEJGlOO in which the transcription of the 
M. cinnamonea CBH li gene Is under tiie control of tiie TAKA promoter. The plasmfd, 

25 pEJGlOO, is tiansfonmed into E. co// Solopac Gold cells (Stratagene, La Jolla, GA) cells. £. coll 
transfomiants containing the pEJGlOO plasmid are isolated and plasmid DNA is prepared for 
transformation and expression in Aspergillus. 

The clone and the nudeotlde sequences of the Stilbella anulata GBH 11 gene described 
above are used for subdoning of the gene and expression in an Aspergillus host. Polymerase 

30 chain reaction approach is used to subclone tiie GBHIl gene (without its own promoter) flrom 
the isolated cDNA clone ZY040206 using primers designed from the nucleotide sequences. In 
order to fadlitate tiie subdoning of the gene fragment into the pALLO 2 expression vector, PGR 
primers were designed containing restriction sites compatible to the cloning sites of pALL02 
(Ncol and Pad) and to satisfy overiap requirements for the Infusion PGR kit protocol 

35 (Glonetech, Palo Alto, GA). The following primers are used for PGR amplification process: 
Primer F4.1 (fonward): AGTGGATTTAGGATGGGGGGTCGATTGTTGG 3' (SEQ ID NO:35) 
Primer R4.1 (reverse): 5" AGTCAGGTGTAGTTATTAGAAGGGGGGGTTG 3' (SEQ ID NO:36) 
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The PGR product was generated using Pfx enzyme (Life Teclinologies) with 1x 
enhancer. The 1400 bp product was gel excised, purified with Qiaquick (QIagen, Valencia, 
CA), llgated into pALLO 2 with the Infusion reaction. The resulting plasmid, pEJG96, Is 
transfomied into E. coli Solopac Gold cells (Stratagene, La Jolla, CA). E. coli transfomiants 
5 containing tiie pEJG96 plasmid are Isolated and plasmid DNA Is prepared for transformation 
and expression in Aspergillus. 



Example 3 

Transfonnation of Aspergillus oryzae 

10 Protoplasts are prepared from A. oryzae strain JAL 250 in which tiie pyrG gene of the 

host strain is deleted. Protoplast preparation and transformation are done as previously 
described (Christensen et al.,supra). A. oryzae transfonmants expressing orotidlne 
monophosphate decarboxylase are selected based on their ability to grow in the absence of 
uracil. Transformants are, spore purified twice on selective plates and the spore purified 

1 5 transformants used for further analysis. 

Example 4 

Expression of Malbranchia cinnamonea CBH II gene and the Stilbella anulata CBH II In A 
oryzae 

20 The transformants are screened for CBH II expression In shake flasks (25 ml medium In 

125 ml flasks) using a medium tiiat contains ttie following in g/L: maltose 60; MgS04.7H20, 2.0; 
KH2PO4, 10.0; K2SO4, 2.0; citric acid, 2.0; yeast extract, 10.0; AMG trace metal solution, 0.5 ml; 
urea 2.0. The pH of the medium is adjusted to 6.5 before sterilization by autoclaving. Flasks are 
Inoculated with freshly harvested spores and incubated in a shaker (200 rpm) at 34 C. Culture 

25 supematants are harvested at 5 days. Five microliters of the culture supematant is run on 8- 
16% Tris-Glyclne gels. For the Malbranchia chnamonea CBH II. the predicted molecular 
weight of the protein is 43 kDa. A smear, significant over background, runs at about 50 kDa 
is seen in the transformants. For the Stilbella anuiata CBH II, tiie predicted molecular weight 
of the protein is 49 kDa. A band, significant over background, runs at about 55 kDa in the 

30 transformant. 

Example 5 

The phosphoric acid cellulose (PASC) was prepared as described by Schulein 1997, J. 
BiotechnoL] \/ol.57, 71-81. Protein concentrations were determined using a BCA Protein 
35 Assay (Pierce) as per manufacturers instructions. Protein aliquots were examined on 8-16% 
Acrylamide gradient gels (Invitrogen) and stained with Biosafe Coomassie Stain (Biorad). 

Aspergillus oryzae broths expressing the Stilbella annulate Cel6A ('-55kD) and the 
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Malbranchia cinnamonea Cel6A(-49kD) en^mes were concentrated using Centricon Plus 20 
(Millipore) filtering de>nces using a swinging bucl<et rotor centriftjge (Sorvall RC3B Plus; total 
time of -25 minutes at 3000rpm). Approximately 3ml of each concentrate was loaded onto a 
10DG Econo PAC column (Blorad) equilibrated with 50mM sodium acetate pH 5.0 and the 
desalted material eluted with 4ml of 50mM sodium acetate, pH 5.0. The protein 
concentrations for each sample were detemriined and aliquots analyzed on 8-16% Acrylamide 
gradient gels. A PASC activity assay (endpoint assay) was perfomned utilizing a 96 well 
microplate format. Briefly, 10 mlcroL of appropriately diluted glucose standards (2mg/ml to 
0.25mg/ml) were placed In wells containing 190 microL of 50ml\^ sodium acetate buffer pH 
5.0 and 0.5mg/ml BSA (Dilution buffer). Reagent controls (200 microL Dilution buffer). 
Sample controls (10 microL dilution to be assayed plus 190 microL Dilution buffer) and 
Substrate controls (10 microL Dilution buffer plus 190 microL 2g/L PASC in Dilution buffer) 
were included In each assay. A set of serial dilutions were generated for eadi sample to be 
assayed and 10 mIcroL of each dilution placed in their designated wells. Reactions were 
Initiated by the addition of 190 microL of 2g/L PASC. Samples were mixed and the plates 
placed in a 50 C water bath for 30 minutes. The reactions were stopped by the addition of 
500 microL of 0.5M NaOH to each well. Plates were centrifuged (Sorvall RT7) for 5 minutes 
at 2000 rpm and 100 microL aliquots of each sample transfen-ed to a 96 well microtiter plate 
with conical wells. Detennlnation of reducing sugar content was initiated by adding 50 microL 
of 1.5% (w/v) p-Hydroxybenzolc Acid Hydrazlde (PHBAH) to each well and incubating the 
plate at 95 C for 10 minutes. The plate was allowed to cool to room temperature and 50 
microL of double distilled H20 added to each well. At this time 100 microL aliquots from each 
well were transferred to a flat bottomed 96 well microtiter plate and the OD 410 read using a 
Spectra MAX plate reader. 

The glucose standards prepared for the PHBAH portion of the assay were used to 
construct a glucose standard curve (A410 vs Glucose concentration in mg/ml). The slope and 
Intercept from this standard curve was used to generate a second graph in which the 
micromoles reducing sugar/min/ml was plotted vs protein concentration (mg/ml) to give ttie 
specific activities (lU/mg) of the samples assayed at 50C. The specific activity for StUbella 
annulata was 0.24 (lU/mg) and for Malbranchia cinnamonea 1.40 (lU/mg). 

Example 6 

Cellobiohydrolase Activity 

A cellobiohydrolase is characterized by the ability to hydrolyze highly crystalline 
cellulose very efficiently compared to other cellulases. Cellobiohydrolase may have a higher 
catalytic activity using PASC (phosphoric acid swollen cellulose) as substrate than using 
CMC as substrate. For the purposes of the present Invention, any of the following assays can 
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AcHvitv on Azo-Avicel 

Azo-Avicel (Megazynne, Bray Business Park. Bray. Wicklow, Ireiand) was used 
according to tlie manufacturers instructions. 

Actlvitv on PNP-beta-cellobiose 

50 microL CBH substrate solution (5 mM PNP beta-D-Cellobiose (p-Nitrophenyl p-d- 
Cellobioside Sigma N-5759) in 0.1 M Na-acetate buffer, pH 5.0) was mixed with 1 mL 
substrate solution and incubated 20 minutes at 40°C. The reaction was stopped by addition 
of 5 mL stop reagent (0.1 M Na-carbonate. pH 1 1 .5). Absorbance was measured at 404 nm. 

Activitv on PASC and CMC 

The substrate is degraded with cellobiohydrolase to fomri reducing sugars. A 

Micmdochium nivale carbohydrate oxidase (ri\/lnO) or another equivalent oxidase acts on the 
reducing sugars to fonm H2O2 in the presence of O2. The fomf>ed H2O2 activates in the 
presence of excess pensxidase the oxidative condensation of 4-amlnoantlpyrine (AA) and N- 
ethyl-N-sulfopropyl-m-toluldine (TOPS) to fomi a purple product which can be quantified by 
its absorbance at 550 nm. 

When all components except tiellobiohydroiase are In surplus, the rate of inarease In 
absorbance is proportional to the cellobiohydrolase activity. The reaction Is a one-klnetic-step 
reaction and may be carried out automatically in a Cobas Fara centrifugal analyzer 
(Hoffmann La Roche) or another equivalent spectrophotometer which can measure steady 
state kinetics. 

Buffer. 50 mM Na-acetate buffer (pH 5.0); 

Reagents: rMnO oxidase, purified Microdochium nivale caribohydrate o)ddase. 2 mg/L 
Peroxidase, SIGMA P-8125 (96 U/mg). 25 mg/L 
4-aminoantlpyrine, SIGMA A-4382, 200 mg/L 
TOPS, SIGMA E-8506, 600 mg/L 
PASC or CMC (see below), 5 g/L 
All reagents were added to the buffer in the concentrations Indicated above and this 
reagent solution was mixed thoroughly. 

50 mIcroL cellobiohydrolase 11 sample (In a suitable dilution) was mixed with 300 microL 
reagent solution and Incubated 20 minutes at 40"C. Purple color formation was detected and 
measured as absorisance at 550 nm. 

The A/!^OPS-condensate absorption coefficient Is 0.01935 A66o/( microM cm). The 

rate Is calculated as micromoles reducing sugar produced per minute from ODeeo/minute and 
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PASC: 

Materials: 5 g Avicel® (Art 2331 Merck); 

150 mL 85% Ortho-phosphorlo-acid (Art. 573 Mercl<); 
800 mL Acetone (Art. 14 Merck); 
Approx. 2 iiter deionized water (Mllii-Q); 
1 L glass beaker, 

1 L glass filter funnel; 

2 L suction flask; 

Ultra Turrax Homogenlzer. 

Acetone and ortho-phosphorio-acid is cooled on Ice. Avicel® is moisted with water, and 
then the 150 mL icecold 85% Ortho-phosphorio-add is added. The mixture Is placed on an 
icebafh with weak stirring for one hour. 

Add 500 mL ice-cold acetone with stining, and transfer the mixture to a glass filter 
funnel and wash with 3 x 100 mL ice-cold acetone, suck as dry as possible In each wash. 
Wash with 2 x 500 mL water (or until there is no odor of acetone), suck as dry as possible in 
each wash. 

Re-suspend the solids in water to a total volume of 500 mL, and blend to homogeneity 
using an Ultra Tunax Homogenlzer. Store wet in refrigerator and equilibrate with buffer by 
centriftjgation and re-suspension before use. 

CMC: 

Bacterial cellulose microfibrils In an impure fomi were obtained from the Japanese 
foodstuff "nata de coco" (Fujico Company, Japan). The cellulose In 350 g of this product was 
purified by suspension of the product in about 4 L of tap water. This water was replaced by 
fresh water twice a day for 4 days. 

Then 1% (w/v) NaOH was used instead of water and the product was re-suspended in 
the alkali solution twice a day for 4 days. Neutralisation was done by rinsing the purified 
cellulose with distilled water until the pH at the surface of the product was neutral (pH 7). 

The cellulose was microfibrillated and a suspension of individual bacterial cellulose 
microfibrils was obtained by homogenisation of the purified cellulose microfibrils In a Waring 
blender for 30 min. The cellulose microfibrils were further purified by dialysing tills 
suspension through a pore membrane against distilled water and the isolated and purified 
cellulose microfibrils were stored in a water suspension at 4''C. 

Example 7 
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Expression of Malbranchia elnnamonea CBH II gene In A. oryzae 

The Malbranchia cinnamonea CBH II gene was expressed in Aspergillus oryzae and an 
en^me of appro)dmately 42 kDa was purified to a purity of 95%. The acHvity was 1650 pnp- 
BDQ. 

5 

Examples 

Two recombinantly expressed (Aspergillus orzyae) CBHIl enzymes from Stilbelia 
annulata (CeI6A) and Malbranchea cinnamonea (Cel6B) were assayed for enzymatic activity 
on phosphoric acid cellulose (PASC). 

10 Aspergillus oryzae broths expressing recombinant Stilbelia annulata Cel6A (~55kDa) 

and the Malbranchia cinnamonea Cel6B (~49kDa) were concentrated using Centricon Plus 
20 filtering devices using a swinging bucket rotor (Sorvall RC3B Plus; -25 minutes at 
S.OOOrpm). Appro)dmately 3ml of each concentrate was loaded onto a 10DG Econo PAC 
column (Biorad) equilibrated with 50mM sodium acetate pH 5.0 and the desalted material 

15 eluted w^th 4ml of 50mM sodium acetate pH 5.0. The protein concentrations for each sample 
were detennined using a BCA Protein Assay Kit (Pierce) and aliquots analyzed on 8-16% 
Acrylamlde gradient gels (Invitrogen). 

A PASC activity assay was perfonned utilizing a 96 well microplate fomnat. Briefly. 10 
microL of an appropriate glucose standard (2mg/ml to 0.25mg/ml) was placed In a well 

20 containing 190 microL of 50mM sodium acetate buffer pH 5.0 and 0.5mg/ml BSA (Dilution 
buffer). Reagent controls (200 microL dilution buffer), sample controls (10 microL dilution to 
be assayed plus 190 microL dilution buffer) and substrate controls (10 mIcroL dilution buffer 
plus 190 microL 2g/L PASC in dilution buffer) were also run. A series of serial dilutions were 
set up for each sample and 10 microL of each dilution placed in their designated wells. 

25 Reactions were initiated by adding 190 mIcroL of 2g/L PASC. Plates were covered and 
placed in a 50°C water bath for 30 minutes. Reactions were stopped by the addition of 500 
mIcroL of 0.5M NaOH to each well. Plates were centrifuged (Sorvall RT7) for 5 minutes at 
2000 rpm. Approximately 100 microL aliquots of each sample were transferred to a 96 well 
microtiter plate with conical wells. Each well then received 50 microL of 1.5% p- 

30 Hydroxybenzoic Acid Hydrazide (PHBAH) and was mixed thoroughly. Plates were Incubated 
at 95'*C for 10 minutes. Following the incubation step plates were cooled to room temperature 
and 50 microL of ddH20 added to each well. One hundred microL aliquots from each well 
were transfenred to flat bottomed 96 well microtiter plates and the OD 410nm read using a 
Spectra MAX plate reader. 

35 Using the glucose standard cun^e (A410 vs Glucose in mg/ml) generated for the 

PASC assay the slope and intercept from this curve was used to construct a second graph In 
which the umoles redudng sugar/mln/ml was plotted vs protein concentration (mg/ml) to give 
the specific activities (lU/mg) for the enzyme samples assayed. In detemnlning specific 
activity (SA) on PASC only percent conversions of less than 2% were used. 

52 



wo 2004/056981 



PCT/DK2003/000914 



Hydrolysis of PCS was conducted using 1.1ml Immunoware microtubes (Pierce) using 
a total reaction volume of 1.0 ml. In this protocol hydrolysis of PCS (20 mg/ml In 60 mM 
sodium acetate pH 5.0 buffer) was performed using different protein loadings (expressed as 
mg Enzyme per gram PCS) of a Thielavia terrestris broth or Celludast 1.5L sample in the 
presence of 3% Aspergillus oiyzae beta glucosidase (3% of Cellulase protein loading). 
Characterization of Thielavia's PCS hydrolyzing capability was done at multiple temperatures: 
40°C, 50°C and 65°C (Isotemp 102S water baths). Typically, reactions were run in duplicate 
and aliquots taken during the course of hydrol^ls <t=0. 2, 4, 6, 8 and 24 hours). PCS 
hydrolysis reactions were stopped by mbdng a 20 microL aliquot of each hydrolyzate with 180 
microL of 0.44% NaOH (Stop reagent). Appropriate serial dilutions were generated for each 
sample and the reducing sugar content determined using a p-Hydroxybenzoic Add Hydrazide 
(PHBAH) assay adapted to a 96 well microplate fomnat. Briefly, a 90 microL aliquot of an 
appropriately diluted sample was placed In a 96 well conical bottomed microplate. Reactions 
were Initiatfid by adding 60 microL of 1.5% (w/v) PHBAH in 2% NaOH to each well. Plates 
were heated uncovered at Q&'C for 10 minutes. Plates were allowed to cool to RT and 
SOmicroL of ddHaO added to each well. A 100 microL aliquot from each well was transfen-ed 
to a flat bottomed 96 well plate and the absorbance at A410nm measured using a 
SpectraMax Microplate Reader (Molecular Devices). Glucose standards (0,1-0.0125 mg/ml 
diluted with 0.4% sodium hydroxide) were used to prepare a standard curve to translate the 
obtained A410 values Into glucose equivalents. The resultant equivalents were used to 
calculate the percentage of PCS cellulose conversion for each reaction. Our benchmark 
conditions for Celludast 1.5L PCS hydrolysis was the following: 50 mg/ml PCS In 50mM 
sodium acetate pH 5.0. -21 mg Enzyme/g PCS (Equal to -10 FPU), In the absence of 
externally added beta glucosidase at 3Q°C. 

Aspergillus oryzae broths expressing the CBHIl enzymes from SUIbella annulate 
(Cel6A) and Malbmnchea cinnamonea (Cel6B) were desalted, concentrated and their protein 
concentrations determined as described In the materials and methods. Analysis of these 
recombinant protein samples on a 8-16% Acrylamlde gradient gel indicates the Stilbella 
Cel6A enzyme (Figure 1. lane #1) has an apparent molecular weight of ~55kDa while that of 
Malbranchea Cel6B is -49kDa (Figure 1 , lane #2). 

To detenmlne whether or not these recombinant enzymes were enzymatically active 
hydrolysis reactions were conducted using a PASC substrate. Under the conditions described 
previously Stilbelta annulate (Cel6A) and Malbmnchea cinnamonea (Cel6B) had spedfic 
activities of 0.24 lU/mg and 1.40 lU/mg, respectively. 

Deposit of Biological Material 

China General Microbiological Culture Collection Center (CGMCC) 

The following biological material has been deposited Dec 19 2002 under the terms of 
the Budapest Treaty with the China General Microbiological Culture Collection Center 
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(CGMCC), Institute of Microbtology, Chinese Academy of Sciences, Haldian, Beijing 100080, 
China: 



Accession Number 
Applicants reference: 
Description: 
Classification: 
Related sequence(s): 

Accession Number: 
Applicants reference: 
Description: 
Classification: 
Related sequence(s): 

Accession Number: 
Applicants reference: 
Description: 
Classification: 
Related sequence(s): 

Accession Number: 
Applicants reference: 
Description: 
Classification: 
Related sequence(s): 

Accession Number: 
Applicants reference: 
Description: 
Classification: 
Related sequence(s): 

Accession Number 
Applicants reference: 
Description: 
Classification: 
Related sequence{s): 

Accession Number 
Applicants reference: 
Description: 
Classification: 
Related sequence(s): 



CGMCC 0859 
NP000980 

Chaetomium thermophilum 
Chaetomiaceae, Sordariales, Ascomycota 
SEQ ID NO:1, SEQ ID NO:2 

CGMCC 0862 
NP 001130 

Myceliophthora thermophila 
Chaetomiaceae, Sordariales, Ascomycota 
SEQ ID NO:3. SEQ ID NO:4 

Acremonium sp. T178-4 CGMCC 0857 
NP001132 

Acremonium sp. T178-4 
mitosporic Ascomycetes 
SEQ ID NO:5. SEQ ID NO:6 

Melanocarpus sp. CGMCC 0861 

NP001133 

Melar)ocarpus sp. 

Trictiocomaceae, Eurotiales, Ascomycota 
SEQ ID NO:7, SEQ ID NO:8 

Thielavia microspora CGMCC 0863 
NP001134 

Thielavia microspora 

Chaetomiaceae, Sordariales, Ascomycota 
SEQ ID NO:9, SEQ ID NO:10 

Aspergillus sp. T186-2 CGMCC 0858 
NP001132 

Aspergillus sp. T186-2 
Trichocomaceae, Eurotiales, Ascomycota 
SEQ JD NO:11. SEQ ID NO:11 

Thielavia australiensis CGMCC 0864 
NP001000 

Thielavia australiensis 

Chaetomiaceae, Sordariales, Ascomycota 

SEQ ID NO:13, SEQ ID NO:14 
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American Tvoe Culture Collection fATCC) 

The following biological material is obtainable from American Type Culture Collection, 
P.O. Box 1549, Manassas, VA 20108, USA. 



Accession Number: 
Applicants reference: 
Description: 
Classification: 
Related sequence(s): 



ATCC 11.39 
NP001144 

Gloeophyllum trabeum 

SEQ ID NO:17, SEQ ID NO:18 



10 



15 



Centraalbureau Voor Scliimmelcultures (CBS) 

The following biological material is obtainable from Centraalbureau Voor 
Schimmelcultures (CBS), Uppsalaiaan 8, 3584 CT Utrecht, The Netheriands (alternatively 
P.O.Box 85167, 3508 AD Utrecht, The Netheriands): 





Accession Number: 


CBS 161.79 




Applicants reference: 


NP001143 




Description: 


Aspergillus tublngensis 


20 


Classification: 


- 




Related sequence(s): 


SEQ ID NO:15. SEQ ID NO:16 




Accession Number: 


CBS 521 .95 




Applicants reference: 


ND001631 


25 


Description: 


Meripilus giganteus 




Classification: 






Related sequence(s): 


SEQ ID NO:19, SEQ ID NO:20 




Accession Number: 


CBS 804.70 


30 


Applicants reference: 


NP000960 




Description: 


Tiichophaea saccata 




Classification: 






Related sequence(s): 


SEQ ID NO:21. SEQ ID NO:22 


35 


Accession Number: 


CBS 185.70 




Applicants reference: 


NP001040 




Description: 


Stilbella annulata 




Classification: 






Related sequence(s): 


SEQ ID NO:23. SEQ ID NO:24 


40 








Accession Number: 


CBS 115.68 




Applicants reference: 


NP001045 




Description: 


Malbranchea cinnamomea 




Classification: 




45 


Related sequence(s): 


SEQ ID NO:25, SEQ ID NO:26 
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